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REMARKS 

Claims 53-58, 71-73 and 86-94 are pending. Claims 53-58, 88 and 90-94 are amended, 
and new claim 95 is added. No new matter has been added by the amendments. 

Election/Restrictions 

The Applicants note the finality of the Restriction Requirement. The Applicants however 
respectfully request rejoinder of at least the species SEQ ID NOs: 22 and 23 in view of the 
submissions herein. 

Specification 

The disclosure is objected to for containing an embedded hyperlink and/or other form of 
browser-executable code at p. 54-55. In response, the paragraph spanning p. 54-55 has been 
amended to remove the browser-executable code. 

Rejections Under 35 U.S.C. § 112 

I. Claims 53-58, 71-73 and 86-94 are rejected under 35 U.S.C. 1 12, first paragraph, as 
allegedly tailing to comply with the written description requirement. More specifically, while the 
Examiner acknowledges that the specification describes an actual reduction to practice of SEQ ID 
NO: 24. the Examiner asserts that the chums are drawn to a large genus of variants of SEQ ID 
NO: 24. and that the Applicants were allegedly not in possession of this genus with respect to 
fragments or variants or substantially identical sequences. 

Claims 53-55, which are the only pending independent claims, have been amended without 
prejudice or disclaimer and without acquiescence to the Examiner's assertions, to recite that the 
claimed polypeptide comprises an amino acid sequence having at least 75% sequence identity to 
SEQ ID NO: 24 and to recite an "immunogenic fragment," Support for these amendments may 
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be found throughout the specification at for example page 13, lines 6-28; page 18, line 26 to page 
19, line 6: or page 30, line 12, to page 31, line 23, The Applicants respectfully submit that the 
specification describes at least three variants of SEQ ID NO: 24, i.e., NleA polypeptides from 
enteropathogenic E. coli (EPEC) and from C. rodentium, as well as from enterohemorrhagic E. 
coli (EHEC), which fall within the presently claimed sequence identity, and therefore describes a 
representative number of species with respect to the claimed genus. Furthermore, recitation of an 
"immunogenic fragment" clarifies that only fragments capable of eliciting an immune response are 
contemplated. Accordingly, this rejection should be withdrawn. 

II. Claims 53-58, 71-73 and 86-94 are rejected under 35 U.S.C. 112, first paragraph, as 
allegedly failing to comply with the enablement requirement. More specifically, while the 
Examiner acknowledges that the specification is enabling for a method for eliciting an immune 
response against an enterohemorrhagic E. coli (EHEC) or SEQ ID NO; 24 (which is 
characterized as a component of enterohemorrhagic E. coli 0157:H7), or for reducing 
colonization or shedding of EHEC, or for treating EHEC infection, in an animal by administering 
an effective amount of a composition or cell culture supernatant including a polypept ide which 
comprises the amino acid sequence set forth in SEQ ID NO: 24, the Examiner alleges that other 
aspects of the claimed invention are not enabled. In particular, the Examiner alleges that the 
specification does not reasonably provide enablement for preventing infection by EHEC in an 
animal by administering an effective amount of a composition or cell culture supernatant including 
a polypeptide which comprises the amino acid sequence set forth in SEQ ID NO: 24. The 
Examiner also alleges that the specification does not reasonably provide enablement for the 
claimed methods with respect to any other A/E pathogen, or component thereof, or for another 
polypeptide comprising an amino acid sequence substantially identical to the sequence of SEQ ID 
NO: 24 or a fragment or variant thereof 

Claims 53-55, which are the only pending independent claims, have been amended without 
prejudice or disclaimer and without acquiescence to the Examiner's assertions, to recite that the 
claimed polypeptide comprises an amino acid sequence having at least 75% sequence identity to 
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SEQ ID NO: 24 and to recite an "immunogenic fragment." In addition, the present claims do not 
recite prevention of infection. 

As indicated herein, the specification describes at leasi three variants of SEQ ID NO: 24, 
i.e., NleA polypeptides from EPEC and from C. rodentium and therefore describes a 
representative number of species with respect to the claimed genus. Furthermore, one of ordinary 
skill in the art would also be able to readily identify NleA variants as, for example, evidenced in 
the enclosed publication by Creuzburg and Schmidt (J. Clin. Microbiol. 2498-2507, 2007) in 
which a large number of NleA variants were detected after the initial identification of NleA as a 
virulence factor by the inventors of the above-referenced application. 

The Applicants also respectfully submit that the specification demonstrates the effect of 
NleA in a C. rodentium mouse model of disease and that one of ordinary skill in the art would be 
able to readily apply the claimed methods to A/E pathogens, as claimed. Accordingly, one of 
ordinary skill in the art would be able to readily identify variants and immunogenic fragments of 
NleA proteins, and to use the claimed methods in connection with A/E pathogens, and the 
Applicants respectfully request withdrawal of this rejection. 

Rejections Under 35 U.S.C. § 102 

I. Claims 53-58, 71-72 and 86-94 are rejected under 35 U.S.C. 102(b) as allegedly 
anticipated by Finlay et al. (WO 02/053181) as evidenced by Hideo et al. (JP20023550742A2, 
partial translation and sequence alignment attached as Appendix B and Appendix A, respectively, 
of the Office Action). 

More specifically, the E caminci alleges that 1 iniaj et al teach method* fot eliciting an 
immune response against an A/E pathogen or component thereof, or for reducing colonization of 
an A/E pathogen, or of reducing shedding (thus allegedly treating an in lection by an A/E 
p uh-y;en> in an an it ml b\ administering an effective amount of a composition comprising a 
culture supernatant. The Examiner further alleges that Hideo et al. teach that E. call EHEC 

9 



Application Serial No. 10/577,742 
Office Action dated August 5, 2010 

0157:H7 makes a protein comprising the sequence ofSEQ ID NO: 24, The Office Action 
further alleges that the culture supcrmitam o.i t inla> etui, is prepared from is. coli EHEC 
0157:H7 under identical conditions as SEQ ID NO: 24 of the instant specification. 

The Examiner therefore concludes that the culture supernatant of Fmiay et al is a 
composition or culture supernatant which comprises a polypeptide which comprises an amino acid 
sequence substantially identical to the sequence of SEQ ID NO: 24 and inherently comprises 20% 
of the cell protein present in the composition. The Examiner is respectfully requested to clarify 
"inherently comprises 20% of the cell pro tein present in the composition" in the context of this 
rejection. 

To support a rejection under § 102, a single prior art reference must describe each and 
every element, either expressly or inherently, of the rejected claims (MPEP § 2131). In the 
present case, claims 53-55, which are the only pending independent claims, have been amended 
without prejudice or disclaimer and without acquiescence to the Examiner's assertions, to recite 
that the claimed polypeptide is "isolated." The term "isolated" as defined in the specification 
refers to a compound that is "separated from the components that naturally accompany it" (see, 
for example, the specification at page 10, lines 20-21). The Applicants respectfully submit that 
Finlay et al, do not teach methods relating to an "isolated" polypeptide comprising an amino acid 
sequence having at least 75% sequence identity to the sequence of SEQ ID NO: 24 as claimed 
and therefore do not anticipate the claimed invention. 

II. Claims 53-58, 71-72, 86, and 88-94 are rejected under 35 U.S.C. 102(b) as allegedly- 
anticipated by Wright (US 5,730,989, 3/24/98) as evidenced by Hideo et al. (supra), 

More specifically, the Examiner alleges that Wright, disclose a method for eliciting an 
immune response against E. coli 0157:H7 or component thereof, in an animal by administering to 
the animal an effecti ve amount of inactivated E. coli 0157:147. The Examiner further alleges that 
the E, coli 0157:117 of Wright is a composition that comprises a polypeptide which comprises an 
amino acid sequence substantially identical to the sequence of SEQ ID NOs: 24, as evidenced by 
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Hideo et ah, and that Wright disclose a method for treating E. coli infection, thus treatment of the 
E. coli infection will allegedly result in reduction in colonization and shedding oiE. coli in an 
animal. 

As indicated herein, to support a rejection under § 102, a single prior art reference must 
describe each and every element , either expressly or inherently, of the rejected claims (MPEP § 
2131). In the present case, claims 53-55, which are the only pending independent claims, have 
been amended without prejudice or disclaimer and without acquiescence to the Examiner's 
assertions, to recite that the claimed polypeptide is "'isolated." The Applicants respectfully submit 
that Wright et ah do not teach methods relating to use of an "isolated" polypeptide comprising an 
amino acid sequence having at least 75% sequence identity to the sequence of SEQ ID NO: 24 as 
claimed and therefore do not anticipate the claimed invention. 

111. Claims 53-55, 71-72, 86 and 90 are rejected under 35 U.S.C. 102(b) as allegedly 
anticipated by Hideo et ah (supra) as evidenced by Wright et ah (supra). 

More specifically, the Examiner alleges that Hideo et ah disclose a method of eliciting an 
immune response against E. coli 0157:117 by administering an effective amount of a composition 
for inducing an immune response against E. coli 0157:H7 comprising a protein 100% identical to 
SEQ ID NO: 24. The Examiner further alleges that Hideo et ah also disclose treating an infection 
by E. coli 0157:H7 using the composition and concludes that treatment of the E. coli infection 
will result in reduction in colonization and shedding of E. coli in an animal The Examiner further 
alleges tha t it is inherent that the methods of Hideo et ah are to be practiced in animals since 
Wright et ah teach that E. coli 0157:H7 infects animals. 

This rejection is respectfully traversed. As indicated herein, claims 53-55 are the only 
pending independent claims, and therefore these rejections will be addressed with respect to these 
claims only. The remaining claims at issue under these rejections are dependent claims and by 
definition subject to the limitations of claims 53, 54 or 55. Claims 53-55 are directed to methods 
for eliciting an immune response against an A/E pathogen or component thereof, or for reducing 
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colonization or shedding of an A/E pathogen, in an animal by administering an effect is e amount of 
a composition or culture supernatant including an isolated polypeptide comprising an amino acid 
sequent having t 75°A qu ice iden y i e set *c ceo SEQ ID \0 24 

The Applicants reiterate that, to support a rejection under § 102, a single prior art 
reference must describe each and every element, either expressly or inherently, of the rejected 
ckunis (MPEP § 2131) and the prior art reference must be enabling: 

"...invalidity based on anticipation requires that the assertediy anticipating disclosure 
enable I he subject matter of the reference and thus of the patented invention without 
undue ^jvamcnl ation. "' Elan Phaooa- eini< ah Inc. v. Mayo Foundation for Medical 
Education & Research, 346 F.3d 1051, 68 USPQ2d 1372 (Fed. Cir. 2003), hereafter 
"Elan" emphasis added. 

Hideo et ah do not meet these requirements, as discussed herein. 

Hideo et ah teach nucleotide sequences from enterohemorrhagic E. coli 0157:H7 SAKAI 
(referred to hereafter as the "EHEC sequences") and assert that these sequences are not present in 
non-pathogenic E. coli K12 (see page 40, paragraph [0010], and page 71, paragraph [0014], of 
the enclosed English translation of Hideo et ah), Hideo et ah also teach predicted amino acid 
sequences based on the identified nucleotide sequences and comparison of the amino acid 
sequences to known sequences from various public databases using known algorithms (see page 
71, paragraph [0016] of the enclosed English translation of Hideo el ah). Hideo et ah classify the 
predicted amino acid sequences into twelve (12) groups (see pages 71-72, paragraph [0017] of 
the enclosed English translation of Hideo et ah), as follows: 1) Proteins having unknown function 
etc., 2) Proteins which have unknown function, but have significant homology to that of other 
bacteria, 3) Proteins comprising Insertion Sequences (IS), 4) Proteins derived from phage, 5) 
Regulatory elements, 6) Proteins relating to fimbriae, 7) Proteins relating to transportation of 
substance, 8) Proteins relating to synthesis of lipopolysaccharide, 9) Proteins relating to 
metabolism, 10) Proteins processing DNA/RNA, 11) Proteins relating to pathogenicity, and 12) 
Other proteins. 
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Hideo et al. also teach that: 

. . .a protein predicted to be a cell surface protein (membrane protein, especially, OMP, 
lipoprotein) in them or its gene (or nucleic-acid molecule) may b c t * ' 

aii umiKh ^ „ 1 t , in n i su ! diaj io a- v H S -I."~~ ! ile o i m the h \ in thcnnorc, 
there is a possibility that they include a protein which has an important function in 0-157, 
for example, transportation and metabolism of a substance, processing of nucleic acids, 
and relates to a regulatory element and pathogenicity. They are to be useful for diagnosis 
and ihuap\ ol O N in <m { m. u^s^n n p n j h [003 1 ] of the enclosed 
English translation of Hideo a ui, emphasis added) 



0-157 specific nucleic-acid molecule of the present invention, a gene included in it, 
peptide and nucleic-acid sequence encoded by the gene are useful for diagnosis and/or 
therapy of Q-157 infection and prevention of symptom occurred by the infection . They 
can also be used for detection of the presence of 0-157 in a sample and classification of its 
strain. Furthermore, they can also be used for screening of useful compounds for 
prevention and/or therapy of 0-157 infection and symptom occurred by the infection, (see 
page 283, paragraph [0047] of the enclosed English translation of Hideo et al, emphasis 
added) 

the scope of the present invention includes a vaccine composition including genes and/or 
polynucleotides of the present invention, and a method for pre vention and/or therapy of 
0-157 infection and symptom occurred by the infection , (see page 283, paragraph [0048] 
of the enclosed English translation of Hideo et al., emphasis added) 



The present invention relates to a pep vaccine fori ttioi > <• therapy of 

0-157 infection comprising effective amount of, at least one kind of, 0-157 specific 
polypeptides having amino acid sequence set forth in the sequence lists or fragments 
thereof. The vaccine formulation preferably include- a pharmaeeaheally acceptable 
carrier, for example, a known adjuvant in the art. (sec page 284, paragraph [0051] old he 
enclosed English tt islation o J id > et ai cmpi > added 

The present invention relates to a method of reducing the risk of 0-157 infection in 
patients or a method for therapy [of the infection], Th^ nv <> i < /ipisa i in >n on 
of the vacc m of ; p esc t e vention to a patient so as to reduce the risk of 

Q-157 infection or provide therapy of infection, (see page 285, paragraph [0053] of the 
enclosed English translation of Hideo et al., emphasis added). 
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The Appl ican ts respectfully submit that these teachings of Hideo et al. do not rise to the 
level of anticipation of the claimed invention. Firstly, SEQ ID NO: 393 of Hideo et al, which is 
asserted to be identical to SEQ ID NO: 24 of the instant application, is called out in Group 2 
(Proteins which have unknown function, but have significant homology to that of other bacteria) 
and is described as follows at page 154 of the enclosed English translation of Hideo et al: 

SEQ ID NO: 393 -0.239229, 442, a minor capsid protein precursor, similar to minor 
capsid protein precursors for example ,GpC [Bacteriophage lambda] 
gi|137565|sp|P0371 1 |VCAC#LAMBD (97% identity in 439 amino acids), capsid assembly 
protein containing Nu3-homoiog; 

The Applicants respectfully note that Hideo et al. do not identify SEQ ID NO: 393 as 
relating to pathogenicity -■ such sequences are listed in Group 11. The Applicants respectfully 
submit that capsid proteins are bacteriophage (bacterial virus) proteins used as part of the viral 
assembly process, and present in the viral coat upon maturation. Accordingly, a bacteriophage 
capsid protein would not be expected to be effective in the methods as claimed. 

Secondly, Hideo et al. speculate that a protein that is "predicted to be a cell surface 
protein . . . may be useful for production of an antibody, vaccine composition, diagnosis of 0-157 
infection . . .," that ". . . there is a fv chilis - that they include a protein which has an important 
function in 0-157, for example, transportation and metabolism of a substance, processing of 
nucleic acids, and relates to a regulatory element and pathogenicity../' and that such proteins ". . . 
are to be useful for diagnosis and therapy of 0-157 infection." 

Accordingly, Hideo et al. simply raise the possibility that some of approximately two 
thousand (2000) sequences may be useful. This assumption appears to be based on the absence of 
the EH EC sequences front E. coli K12. Hideo et al. compare the sequence of the pathogenic 
bacterium, EH EC 0 157:H7, with that of the non-pathogenic K12 strain, and assert that the 
EHEC 0157:H7 sequences thai differ from the K12 sequences are pathogenic simply because 
EHEC 0157:H7 is highly pathogenic and K12 is not, Hideo et al. do not provide any 
experimental data or other evidence to support this assertion. 
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The Applicants respectfully submit that the assumption that my sequence present in a 
pathogenic organism and absent from a non-pathog < > lism is necessarily useful is incorrect 
and that very few of the EHEC 0157:H7-specific sequences are implicated in human disease, 
More specifically, non-pathogenic K32 and EHEC 0157:117 share about 80% sequence identity 
and are about 20% different. Given that the genomes of these organisms are about 4 million base 
pairs, the difference is about 800,000 base pairs. All the known virulence factors encode only 
about 50-100,000 base pairs (e.g., the LEE region, which encodes the Type III secretion system is 
34,000 base pairs, the Shiga toxin is 1,000 base pairs, etc), thus making up only a small traction 
of genomic differences between non-pathogenic K.12 and pathogenic EHEC 0157:117. For 
example, many of the non-LEE encoded proteins have no effect on viru lence, and are found in 
0157 but not in non-pathogenic E. coll. Accordingly, one of ordinary skill in the art would 
recognize that not all of the EHEC 0157:H7-specific sequences set out in Hideo et al. encode 
virulence factors. Furthermore, the inventors of the present application were the first to identify 
NleA polypeptide (SEQ ID NOs: 22-24) as a virulence factor. The term 'Virulence factor" is 
understood by those of skill in the art as a molecule required to cause disease, that is not normally 
required for viability of the micro-organism producing it in non-disease settings. It is further well 
known to a skilled person that, once identified, a virulence factor is useful to induce an immune 
response in animals, but prior to such identification there would be no reason to conclude that any 
protein would be useful. 

Furthermore, Hideo et al. make it clear that the contemplated use of the disclosed 
sequences is in the context of treating infection in a patient . The term "infection" is defined as 
"[ijnvasion by and multiplication of pathogenic microorganisms in a bodily part or tissue, which 
may produce subsequent tissue injury and progress to overt disease through a v ariety of cellular 
or toxic mechanisms" or the "pathological state resulting from having been infected."' (sec. 
infection, Dictionary.com, The American Heritage® Stedman's Medical Dictionary. Houghton 

i * a 1 com/browse/infection , accessed: November 18, 

201 0). Therefore, the term "infection" contemplates that the infected subject or animal exhibits 
symptoms of clinical disease. By contrast, ruminants may be colonized by and shed highly 
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virulent A/E pathogens, or exhibit an immune response against an A/E pathogen or component 
thereof, without ever exhibiting symptoms of overt disease. 

Accordingly, the Applicants respect fully submit that Hideo et al. do riot teach methods 
relating to use of an isolated polypeptide comprising an amino acid sequence having at least 75% 
sequence identity to the sequence of SEQ ID NO: 24 in ruminants, as claimed, and therefore do 
not anticipate the claimed invention. 

For the sake of completeness, the Applicants note that the Examiner also asserts that 
Wright et al. teach that £'. coli 0157:H7 infects animals. With respect, Wright et al. disclose that 
E. < oil 0157:H7 "was the first IIU < strain identified in humans and remains the most common 
infect ious cause of bloody diarrhea and hemorrhagic colitis in hum ans" (col. 1, 11. 34-36, emphasis 
added). Wright et al. also disclose that ". . .cattle, pigs, lambs and poultry may all be 
environmental resen/oirs for verocytotoxin-producing enterohemorrhagic E. coli" (col. 1, 11. 49- 
51, emphasis added). The term "reservoir" is defined as "[ajn organism or a population that, 
directly or indirectly transmits a pathogen while being virtually immune to its effects" (see, 
reservoir, Dictionary.com, The American Heritage® Stedman's Medical Dictionary. Houghton 
Mifflin Company. http://dictionary.reference.cm n 1 c ' . ah umu. n.-.es-,ed. November 18, 
2010). Accordingly, the Examiner is in error in stating that. Wright, et al. teach that E. coli 
0157:H7 infect s \r n d- -ne the term 'uu >u ' i ^ i p , - thai the aitecicd ^ib, < t or 
annual exhibits symptoms of clinical disease. 

Rejections Under 35 U.S.C. § 103 

Claims 53-58, 71-72, 86 and 90-94 are rejected under 35 U.S.C. 103(a) as allegedly 
obvious over Hideo et al. ( supra) in view of Wright et al. (supra). 

More specifically, the Examiner alleges that Hideo et al. disclose a method of eliciting an 
immune response against E. coli 0157:H7 by administering an effective amount of a composition 



16 



Application Serial No. 10/577,742 
Office Action dated August 5, 2010 

for inducing m immune response against E. coli 0157:117 comprising a protein identical to SEQ 
ID NO: 24. The Examiner further alleges that Hideo et ah also disclose treating an infection by 
E. coli 0157:H7 using the composition. The Examiner further alleges that treatment of the E. 
coli infection will result in reduction in colonization and shedding of E. coli in an animal. While 
the Examiner concedes that Hideo et ah do not disclose that the animal is a ruminant or bovine or 
ovine or human, the Examiner alleges that it is inherent that the methods of Hideo et ah are to be 
practiced in animals since Wright et ah teach that E. coli or E. coli 0157:H7 infect animals, such 
as eaiiic, lamb and humem \nd can > diu;rhe;i 

The Examiner therefore alleges that it would have been prima facie obvious to one of 
ordinary skill in the art at the time the instant invention was made to have used the method of 
Hideo et ah for animals such as cattle, Iamb and humans, thus resulting in the instant invention 
with a reasonable expectation of success. The Examiner finds the motivation to do so in the 
teachings of Wright et ah that E. coli or E. coli 01 57:H7 infect cattle, lamb and humans and 
cause diarrhea. 

This rejection is respectfully traversed. The Applicants respectfully submit that, further to 
the Examination Guidelines for Determining Obviousness Under 35 U.S.C. § 1 03 in View of the 
Supreme Court Decision in KSR International Co. v. Teleflex Inc. (72 Fed. Reg. 57,526 (Oct. 10, 
2007); hereafter the "Guidelines"), a proper rejection under 35 U.S.C. § 103 requires: 

1. a finding that the prior art included each element claimed, although not 
necessarily in a single prior art reference, with the only difference between the 
claimed invention and the prior an being the lack o factual combination of the 
elements in a single prior art reference; 

2. a finding that one of ordinary skill in the art could have combined the elements 
as claimed by known methods, and that in combination, each element merely 
would have performed the same function as it did separately; 
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3. a finding that one of ordinary skill in the art would have recognized that the 
results of the combination were predictable; and 

4. whatever additional findings based on the Graham (actual inquiries may be 
necessary, in view of the facts of the case under considera tion, to explain a 
conclusion of obviousness. 

In the present case, as discussed herein, the teachings of Hideo et al. arc speculative and 
the teachings of Wright et al. do not cure the defect in Hideo et al 

More specifically, Hideo et al. compare the sequence of the pathogenic bacterium, EHEC 
0157:H7, with that of the non-pathogenic K12 strain, and speculate that the approximately 2000 
EHEC 0157:H7 sequences that differ from the K12 sequences are pathogenic simply because 
EHEC 0157:H7 is highly pathogenic and K12 is not. For example, Hideo et al speculate that a 
protein that is "predicted to be a cell surface protein ... may be useful for production of an 
antibody, vaccine composition, diagnosis of 0-157 infection that ". . . there is a possibility 
that they include a protein which has an important function in 0-157, for example, transportation 
and metabolism of a substance, processing of nucleic acids, and relates to a regulatory element 
and pathogenicity..." and that such proteins "... are to be useful for diagnosis and therapy of O- 
157 infection." Hideo et al do not provide any experimental data or other evidence to support 
these speculations. 

As is discussed in detail above, the assumptions made by Hideo et al are incorrect and 
very few of the EHEC 0157:H7-specific sequences are implicated in human disease. 
Accordingly, one of ordinary skill in the art would recogni/c that not all of the EHEC 0157:H7- 
specific sequences set out in Hideo et al. encode virulence factors. 

Furthermore, SEQ ID NO: 393 of Hideo et al., which is asserted to be identical to SEQ 
ID NO: 24 of the instant application, is identified as being of unknown function but similar to a 
capsid protein, which are bacteriophage (bacterial virus) proteins used as part of the viral 
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assembly process and would not be expected to be effective in. the methods claimed in the instant 
application, As discussed above, the inventors of the present application were the first to identify 
NleA polypeptide (SEQ ID NOs: 22-24) as a virulence factor, which would then lead a skilled 
person to conclude that it would be useful to induce an immune response. Prior to such 
identification there would be no reason, to conclude that any protein would be useful, in the 
methods claimed in the instant application, 

Finally, Hideo et al make it clear that the contemplated use of the disclosed sequences is 
in the corttexi <>'■ trailing mice! ion ■■■■■ i pjti ■ =1 i ulv ■ thin ;n ■ unnrrinl ■> which ni o be colonized 
by and shed highly virulent A/E pathogens, or exhibit an immune response against an A/E 
pathogen or component thereof, without ever exhibiting symptoms of o vert disease. 

Turning to Wright et al., this reference discloses that E. coli 0157:H7 "was the first 
EHEC strain identified in humans and remains the most common infectious cause of bloody 
diarrhea and hemorrhagic colitis in humans'' (col. 1, 11. 34-36, emphasis added). Wright, et al. also 
disclose that ". . .cattle, pigs, lambs and poultry may all be environmental reservoirs for 
verocytotoxin-producing entero hemorrhagic E. coir (col. 1, 11. 49-51, emphasis added). The 
term "reservoir" is defined as "[a]n organism or a population that directly or indirectly transmits a 
pathogen while being virtually immune to its effects" (see, reservoir, Dietionary.com, The 
American Ih itage® Stedma '& Medic il Dictionary. Houghton Mifflin Company. 
http://dictionaiy.reference.co m/'browse/infection , accessed: November 18, 2010), Therefore, 
contrary to the Examiner's assertion, Wright et al. do not teach that E. coli 0157:H7 infects 
ruminants since the term "infection," as indicated herein, contemplates that the infected subject or 
animal exhibits symptoms of clinical disease. 

Accordingly, one of ordinary skill in the art would not have recognized that the results of 
the combination of Hideo et al. and Wright et al. were predictable since Hideo et al. provide no 
guidance as to which of over 2000 sequences may be useful and Wright et al. do not cure this 
defect. Therefore, Hideo et al, considered alone or in combination with Wright et al., do not 
render the claimed invention obvious. 
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Claims 53-55, 71-72, 86, 88-89 and 90 are rejected under 35 U.S.C. 103(a) as allegedly 
obvious over Hideo et al. (supra) as evidenced by Wright et al. (supra) in view of Finlay et al. 
(supra). 

More specifically, the Office Action alleges that Hideo et al. disclose a method of eliciting 
an immune response against E. coli 0157:H7 by administering an effective amount of a 

COinp<^ UK"! k>l issiiiiCiiif V\:l:'l';'C O p.\ : ^ i^liHst/ , ' { M ~ I! 0< .!!-,:, :o tpU'iUil 

identical to SEQ ID NO: 24. The Office Action further alleges that Hideo et al. also disclose 
treating an infection by E. coli 0157:H7 using the composition. The Office Action further alleges 
that treatment of the E. coli infection will result in reduction in colonization and shedding of £'. 
coli in an animal. The Office Action further alleges that it is inherent that the methods of Hideo et 
al. are to be practiced in animals since Wright et al. teach that E. coli or E. coli 0157:1-17 infect 
animals. 

While the Office Action concedes that Hideo et al. do not disclose that the composition 
further comprises EspA, EspB, EspD, EspC, intimiii and Tir or an adjuvant, the Office Action 
alleges that Finlay et al. teach methods for eliciting an immune response against an A/E pathogen 
or component thereof, or for reducing colonization of an A/E pathogen, or of reducing shedding 
(thus allegedly treating an infection by an A/E pathogen) in an animal by administering an 
effective amount of a composition comprising a culture supernatant where the composition 
includes EspA, EspB, EspD, EspC, intimin and Tir and/or further includes an adjuvant. The 
Office Action further alleges that Finlay et al teach that the composition treats the EHEC 
infection and/or reduces colonization of the animal and leach that administration of the 
composition to an animal stimulates an immune response against one or more secreted antigens, 
such as EspA and Tir, winch blocks attachment of the EHEC to intestinal epithelial cells. 

The Office Action therefore alleges that it would have been prima facie obvious to one of 
ordinary skill in the art at the time the instant invention was made to have combined the 
composition of Hideo et al. with that of Finlay et al., thus resulting in the instant method (wherein 
the composition further comprises EspA, EspB, EspD, EspC, intimin and Tir or further comprises 
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an adju vant) with a reasonable expectation of success. The Office Action finds the motivation to 
do so because both compositions are allegedly individually taught in the prior art to be useful for 
the same purpose i.e., inducing an immune response against E. coli EHEC 0157:H7 and Finlay et 
al. allegedly provide additional motivation in that administration of the composition to an animal 
stimulates an immune response against one or more secreted antigens, such as EspA and Tir, that 
blocks attachment of the EHEC to int estinal epithelial ceils. 

This rejection is respectfully traversed. As discussed herein, in the sections addressing the 
rejections under 35 U.S. C. 102(b) and 103(a) with respect to Hideo et al, one of ordinary skill in 
the art would not have recognized that the results of the combination of Hideo et al. and Wright 
et al., with or without Finlay et al., were predictable since Hideo et al. provide no guidance as to 
which of approximately two tho usand (2000) sequences may be useful and Wright et al do not 
cure this defect. The addition of Finlay et al. also do not cure the defects in Hideo et al. and 
Wright et al. More specifically as indicated herein, the inventors of the present application were 
the first to identify NieA polypeptide (SEQ ID NOs: 22-24) as a virulence factor, which would 
then lead a skilled person to conclude that it would be useful to induce an immune response. 
Prior to such identification there would be no reason to conclude that any protein would be useful 
in the methods claimed in the instant application and Finlay et al. do not provide such 
identification. Accordingly Hideo et al., considered alone or in combination with Wright et al. 
and/or Finlay et al, do not render the claimed invention obvious. 
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Conclusion 

The Applicants respectfully request that a timely Notice of Allowance be issued in this 

case. 



Respectfully submitted, 

Date: January 5,2011 By: / Andrew T. Serafini' 

Andrew T. Serafini, Reg. No. 41,303 
Fenwick & West LLP 
801 California Street 
Mountain View, CA 94041 
Telephone: (206) 389-4596 
Facsimile: (650) 938-5200 
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Molecular Characterization and Distribution of Genes Encoding 
Members of the Type III Effector NleA Family among 
Pathogenic Escherichia coli Strains 7 

Kristina Creuzburg and Herbert Schmidt* 
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In this study, we investigated the occurrence of the piwiitusly described gene «/f.l 4 „, 5 and variants of 
putative!) encoding fsofs-itjctis-of'-eriterifcytc-effaeemfat-fficocli'd type 111 effector proteins with functions that 
are unknown. nk-A variants were delected in ISO out of 170 enteropathogenk Escherichia coli strains and 
enterohemorrhagic k. <•«// strains, two of them being cue negative, fk-*kk.-s the kn<»»n variants »/&4 479S , Z6024, 
and the ev^/dske gent-, il novel «fe/l variants s\\ih different lengths and sequence identities at the deduced 
amino acid level (between 71% and 96%) have been identified. Whereas most of the serogronps associated with 
more severe disease were tjuite homogenous with respect to the presence of a particular nieA variant, other 
serogronps were not. Moreover, Southern blot hybridization revealed thai certain strains carry two copies of 
nieA in their chromosome, frequently encoding different variants. In most cases, the open reading frame of one 
of the copies was disrupted, usually by an insertion element. Furthermore, transmission of the type III 
eifector-encoding gene could be shown by transduction of «fe.4-carrying bacteriophages to a laboratory E. coli 



Enterohemorrhagic Escherichia coii (EHEC) and entero- 
pathogenic E. coli (FPEC) can cause serious gastrointestinal 
diseases and arc able to damage the gut epil.helia of their hosts 
by a sophisticated mechanism of attachment and effacement 
(11). Following adherence to intestinal cells, attaching and 
effacing (A/E) E. coli organisms interfere with cytoskeletal 
processes and produce a specific histopatin 'logical feature that 
localized desired ion of the brush border 
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termed the locus oi cnlcrocytc elf'acemcnt if. EE) (14). 
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host survival and pathogen transmission (7). NleB is probably 
a virulence determinant « 20). whereas I IspK cx>-> Id be involved 
in intestinal colonization (38). 
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shows 819/ identity at the amino acid level to the protein 
Z6024, encoded by phage CP-933P in E. coli 0157:H7 strain 
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for virulence (16, 27). but the function of this effector protein 
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TABLE 1. PGR primers, cycling conditions, and PGR product lengths 



Primer for 

nleA target Primer nucleotide sequence'' 

gene variants 



V83-tor2 


5'-ACAGCAACATGCACCGGAATGC-3' 


58 


90 


9,59-1,112 


Vs3-rcv2 


5'-CTTCCATCGCA( X -TATATCAGC-3' 








VW-ior2 


: - \( V-t \ m \KG V < f.i. \Ui.f -- 


55 


90 


1,015-1,168 


VS3-rev3 


5'-GATATCGATGACCACATCTTCAGG-3' 








VarA-for* 


5'-TATTAAAGCTGTCC.\C.\TGGG-3' 


50 


120 


1,434-1,584 




V-TGGTGTATTTGTT1 i"GTGGGG-3' 
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5 ' - A G CTT A G ACTCTTGTTTCTCG-3 ' 
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56°C. After the addition of 3 p.! RNnse A ( i 00 nigaiii t (Sigca-Aldrich) and 200 
ixl distilled water, the mixture was incubated for another '■»mm ai 37 ■: ', lummy, 
after extraction with an equal xolume <>t phcnoi-chiorolotirr-isoaoiel aieoiiol 
(25:24:3 ), the liquid phase wa- transferred into a iresh lube. The genomic UNA 
t 1 1 c ipit Hi d 111 I 1 11 I i 11 II I e e 

(pH 7.2) for 30 min at -20°C, followed by centrifugation for 30 min at 13,000 
rpm and 4°C. The pdM was washed with 7 0'7. cthatioi and dissolved in 100 pi 



purification of two ad extracts with the OlA! X il get caitaciion kit (QIAGEN) 
and - as I il.ch ti with, t'ne klenow I. .. i... tit . a tlx 1 iN \ polvnu las, included in 
die DIG D\A labeling and .actcsaiou kit : Roche Diagnostics;, bor idenlitie.il ion 
of the size of the obtained Dal A. fragments, a GcneKuiei I -kh DNA ladder and 

Phage transduction. An overnight culture <>! the respective Ii. rv/7 strain was 

us. d t ■ in ..ui "i W0 ml oi I una i ettani (i ill teeth com a 1 ml 1 M ( a< 'R, 

followed by an incubation villi > igorous diakinc unlii ats opt ic.tl density ai 600 
is I i pi i l i h I ) ! i 0 n 

\ ?5 mi n i I ill Ml it I he 

liask with the hacn. -rial sta-.pen a incnl etnagh; tic pi cage panicles 

e ta-paraied from ihe ecu di en ion I 50 mil), a < "l 

foliowed hv i.iti.aiou ihtoa-d. a hintr. I lilM ;Vi!iaiman) to i emote tiaiieilai 
c i 1 > i id RN Sigma-. Mdi ach 1 liu , i 

una q e s ... . oiet uiiiil n at 5 nun sodium 

chiondewas i una .i i t i i i! i and was dissolved, 

and ihe solution was inoihated on to. to; ! n Mi..i ■ cenitiiiigatioa, aep i'dattq 
c;, !0 min, 4 : C), the phage panicles ware piecitiiiaied I pi t tile it pi l 1 1 
i I t o. to ,n. t p .h c hi i i i 

dissolved at room I i | i i i the m I i was incubated on ice for 1 h. Phage 
,,uik& itwlni , . b' i . i i s ( 3,1 1 t u C). Ihe resulting 
phage pellet »a diicd at loom tempeiatuic ...a a. .1 id in 1 mi of SM buffer 
,i x,,i 1 s mM tj„, i ,,ai , -,i o N | ,j no] 



ncubation for 48 h at 37 



2500 CREU ZBXJR G AND SCHMIDT 



TABLE 2. Di 


4fibt!i !• m of six types, 


~:ae subtypes, and nleA - 




patin .oi :: 


c I v->iatcs and re? 


irieiioa iragmem lengths 






from Southern blot hybridizations with 





5 P^e 








Disease" 


tr tvne ea 








„ ', j 


Host 


(no. of cases) 


e type. 


K&4 variant(s) 


"''i'l'l'l.k 'i' l-a'aLa' ! ' 

— 1_ — 


035:11 (i) 


Rabbit 






P 


Z6024 




026:.Hii (i) 
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Rabbit/chicken 


D 




P 


espWke 




O103:H2 (1) 
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TABLE 2— Continued 
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1 
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1/2 
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determined by PGR with primer 



V83-for2 in combination with cither VS3--rev2 or V83-rev3, 
which is complementary to conserved regions of this gene. 
Primer V83-rev3 was eons I cue led because it was not possible to 
amplify a VCR product from all strains with the primer V83- 



rev2. Total DNA of PCR-negative strait 
, s probe ! 



was hybridized with 
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i thr. 
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:H8 isolates (Tat 
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two food isolates bitl were absent from four tw-negativc ani- 
mal isolates and one eae -positive animal isolate. Moreover, 
they were found in 119 out of the 135 human isolates, whereas 
the 15 «/a4-negative isolates included 8 ew-positive and 8 
eae-negative E. coli strains. In addition, we were able to detect 
nleA genes in all 48 HUS isolates as well as in 40 out of 48 
human strains associated with diarrheal disease (Table 2). 
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Restriction analyses of the nleA variants. The PCR products 
obtained with primer pairs V83-for2A 7 83-rev2 and V83-for2/ 
V83-rev3 did not have the snmc length. Hieir sizes ranged 
from 959 bp to 1,168 bp (Table 1). Two restriction enzymes 
were chosen for differentiation because of the similarity of 
nleA 479S and Z6024. Separate restrictions of all PCR products 
with BseNI (Fig. 1A) and PstI (Fig. IB) showed 1.1 different 
patterns among the 150 PCR products, in addition to the 
already known variants nleA 47gs , Z6024, and the espIAike gene 
of E. coli. The expected restriction pattern for nleAlespI of C. 
rodentium was not observed. After restriction with BseNI, PCR 
products obtained with strains 5721/% aud 0917/99, depicted 
in lanes 4 and 5, respectively, showed the same pattern (Fig. 1), 
whereas different patterns were obtained by using Psfl. Fur- 
thermore, PCR products of strains 5721/96, 3439/00, CF11201, 
and CB6389 in lanes 4, 8, 9, and 10 as well as those of strains 
PT272 and CB8745 in lanes 6 and 11 share the same PstI 
restriction pattern (Fig. 1), in contrast to their varying BseNI 
patterns. Moreover, the addition of molecular weights of 
BseNI and PstI restriction fragments of strain 1247/95 in lane 
13 revealed molecular weights that were approximately double 
the weights of the other strains. Based on this observation and 
the arrangement of the restriction fragments, we concluded 
thai tliis \'<1< i roducl is i rn.i\!;nc >i the two ale i \ uiants 
shown iii lane 2 and lane 8 (Fig. 1). 

Molecular characterization of the nleA variants. In order to 
prove the assumption that each restriction pattern represents 
an independent ,';/• I \ jhmiC bn -.ach ps'ivm. one strain of 
each serogroup was selected and the respective PCR product 



i dui 



ed P 



. this I 



of the restrj. 



ern shown in Fig. i, lane 13, led 
to the verification of the hypothesis I hat some strains may carry 
more than one copy of nleA. Therefore, Southern blot hybrid- 
ization was performed (1 ig. 2). 

Two copies of nleA variants were detected in most of the 
«fe4-posttive isolates of serogroups 026, 084, and 0156. Fur- 
thermore, two copies were detectable in three 049:NM strains, 
originating from pigs that probably were from the same farm, 



and one human 0103:112 isolate. All the other ;;fed-j)osi;ive 
049 and O103 strains examined possessed only one copy of the 
gene. This was also true for all E. coli Olll, Q.128, 0145, and 
6 1.57 strains (Fig. 2; Table 2). 

DNA fragments were amplified by using primer VarA-for 
either in combination wil k VarA-rev, which binds in the region 



he acne f'o.r UNA 



jque 



: lol 



f PCR 
lowing 



of the TOPO TA cloning lot (Invitrogen). If two PCR products 
were amplified, the PCR product with the expected size was 
extracted from a gel prior to sequencing. 

DNA sequencing resulted in the identification of 11 new 
nleA variants, termed nleA 1 to nleAU. besides the three known 
variants of pathogcuic A', coli (big. 3b We defined an open 
reading frame (ORF) with a cutoff value of less that) 97% 
sequence identity at die deduced amino acid level as an indi- 
vidual variant of nleA. By sequencing, two nleA variants were 

tl ff the esj i I si Is variants iileAI, nkA2, 

and the evp/-iikc gene. PCR products with the primer pair 
V83-for2/V83-rev2 from all concerned strains were restricted 
either by Bpull02I or by Csei, Moreover, many members of 
variant nleA8-l (see below) differ in I bp in the recognition site 
ol BseNS, resulting in diliereni. restriction patterns. This is 
shown in lanes 4 and 8 m big i. bo distinguish this variant, 
nleA8-l, from nleA 3 and rdeM, which had the restriction pat- 
tern shown in lane 4 (Fig. I), the restriction enzymes Bell, 
Nhel, and Sphf were used. 



\ he 



sequence identities to each other of between 71 bb and W/c at 
the deduced amino acid level. Several variants showed strain- 
specific differences in the amino acid sequences that were 
caused by one lo three point mutations. These mutations were 
not taken into consideration in this study. Only the strain- 
specific differences in the sequence of variant NleA6 and 
NleAS were separated by an additional numerical suffix, be- 





Hii. Southern blot !ivh: .di/aiion of di Sic rem ;;/■//! variants with an «fe I probe, i'hc following H. ■ sirains were used (the variant 
harbored is named in parentheses): iane i, ON4:if-i -.nam 4?°.V97 {;>/<■/! „.,.)■, |;,.>o 2, Ol57:!!7 -Main Oi 5iv96 (/6024's. lane OI03.M? strain 
2576/97 c'the e.sp/-like gene): iane 4. OI56:Hl .-train J. II C 94460 ink A! >; fane 5,OW5:NM strain 4672/9'.) Oae.-!.:}: iane 6. OlH:H2 strain 572 J. 96 
(m'M*-/); iane 7, OS4:K2 straw CB/!97 (nUvU): iane 8. OI45.H38 strain 0917/99 (n/r.15); lane 9. OJ2?:H6 strain F.-.M48/6'; (negative control); 
lane 10, 0156:H25 strain PT272 (nkA6-l and nkAS-2): iane II. 084:11 strain CB6T16 (r.kAT and nieAP-Zv. lane 12, 026:3111 strain 3439/00 
(.n/e/tfl. / ); iane 13, 01Z5:H -train CFl • 201 (nkA$-2): lane 14, OM ;ni -tr.un < !»>3w One.-!*.? and ,;fe-i<>): iane 15. OJ56:M25 strain CB8745 
\rdeA6-2 and nkAS-2): lane 16. C49:NM strain < 157690 (iiieAlO). iane 17. 026:H" strain 5720/96 (26024 and nleAS-1); lane 18. 0145-.H28 strain 
DG264/4 {nleAU). M, is the lambda mix marker 19: M : is a GeiK-KuItt i-kl> I >N,\ ladder (Fermentas). 



cause NleA6-2 possessed an insertion of four amino acids that 
were absent from NleA6-l, resulting in a deduced protein 
length of 462 amino acids. On 1 tie other band, NleA8-l and 
NieA8-2 varied in 10 amino acids of the C-terminal end (Fig. 

riated with specific sero- 
ngth of the 15 deduced 



3). This different 
(Ta 



. Mure 



■ be as< 
, the 



These variations are due to t 
acids, in particular of alami 
middle region of the deduce, 
this region includes a putath 
oflhe absence of this region, 
the variants Fspl-likc protein 
and Nle.A of ('. rodeniiutu [ 
membrane helix is located in the ( '-terminal thud of the dc- 
tkiced ammo acid sequence, l itis one is present it) all variants 
described { Fig, 3). 

Most ol the strains of sciogroups <>26. G84, and (>156, as 
well as three <)49:NM isolates from pigs, probably originating 
from the same fains, and one <'.» 03:112 strait;, harbored two 



membrane helix. Because 
ative helix is missing from 
I, NleA2, NleA7, NleAlO, 
r putative t 



variant nkAS-2 because of the insertion of the insertion se- 
quence (IS) element ISEoS 651 bp upstream of the 3' end of 
the gene. In contrast, 1 1.7 bp of the 5' end of variant nleAS-2, 
present in serogroup 0156, was missing. The inserted se- 
quence resembled the region upstream of nlaA 479S of the 
prophage BP-4795 and, to some extent, an ISEcS element. 
Therefore, 179 bp that was in BP-4795 was missing from this 
sequence. Furthermore, the first 34 bp of the c wMike gene o! one 
0128:112 pigeon isolate was deleted due to insertion of an IS 
element, in contrast to other strains of serotype 0128:112 that 
harbored a complete ORF ol the e-sp/- like gene. Variant ukAd 
also may encode a truncated protein because of the insertion of 5 
bp located 78 bp downstream of the 5' end oi the gene. 

Most of the isolates of setogrottps associated with severe 
human disease harbored variant 7.6024, nkA8-I. or the cspl- 
like gent: ( fable 2 1. Moreover, only one or three dilfcreni 
valiants of nieA could be delecled from the serogroups O20, 
0157,0103, and OIU. in contrast, serogroup 0145 appeared 
to be iieier.vge-ncous. with s:\ diil'ereni variants. Whereas se- 



leleti 



f I i 



tionai puttitive protein, '['his v 
the 084:11 isolate thai h. 
>ikA8-2. Other strains of seroj 
i;kA8-2 or nlt-A'J and nieAS-. 



i truncated, possibly nonl'iine- 
is also the case for nieA8-I of 
•bored variants nleAS-1 and 
•oup ()84 harboring nkA7 attd 
showed a disrupted ORF of 



positive 0128 strains possessed onh the cspi-l ike gene. Vari- 
ants 76024, nkAH-l. attd lite c.sy;/-!ike gene occurred most 
frequently, followed by variants nleA8-2 and nkv\2. In contrast, 
the other nine variants were detected only itt one io six isolates 
( Table 2). Furthermore, most of the ecu: subtypes \i, 7. 0, i. and 
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•■Hi >. Aiignntmi t.!' I he deduced amino acid si...|i:cnccs t>! tin lour know r, /,.'. /i \;u :::ni-. NIcA , ..... /.•'»>- 1. I he i : -.pi-likc protein, and N ;■..'>.. : 
»■. II :c. I ho newly liKccvcicil vanan^ NIcAi l« Nit A I I . •' )ttl\ l!n. Vicouiiiai icgeao, die- I wo pinanv.. tr;iitMitcini»it:i<: hch.es. ami the C-lerimn 
regions are >hov. is. I he position number refer U> the sequence oi Nit A , Idenlic-'i annuo acid- are depicted by iin|s. and amino acids ihai ,u 
absent tr«»»i a ptniiealai sequence are iadietucd by il.'i.-.he-.-.. Hie piiianSe tf.iiu.tnciiihr.inc helices :>:••.. I;)k:»..ii in gray. 



e, detected in a larjjcr mmihcr of isolates, wen; associated with 
four to eight dt Ik. s.r < e r, 1.1 i'si !J.|„ 1 tVsausc of the rare 
occurrcna ol most o! ihc nle i N ariants as well as the association 
of the espl-likc gene, iikAS-l, and /uC-tvilh at icast two differ- 
ent eae subtypes, a direct correlation between a certain variant of 
rieA and a specific cm- subtype could not be defined. 

Transduction of nk. J variants. Previous ty. it was shown that 
nkA ,-„ )s and ihc ov;;/-i:ke gene arc ioe'aied in the genome of 
prophages, which arc fully inducible to produce phage particle 
progeny, whereas Z6024 Is located on a cryptic prophage (6, 
24, 30). In order to determine whether the newly discovered 
variants arc Within sntael phages thai are abie to spread tiie 
>iic. i eneoded T3SS eilecior by hori/otitai gene translcr. trans- 
duction experiments were carried out. A collection oi 24 
pathogenic E. co!i strains harboring different nkA variants 
is for these; expenmen is (Table 3). 
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ich \ 
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ed a 
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t detached dur- 



nduc; 



ti pkb 



: Ta- 



ble 3). AS! strains harboring stx 2 belonged to serogroups asso- 
ited wi e i i st i is i ie iginated 

front patients with HUS. Only one strain, the Of 45:H28 isolate 
CB4973, also possessed lite variant nkA6-l in Hie genome of 
an inducible prophage. Furthermore, the three isolates 4795/ 



97, 01-08612, and CB63X9 ol scrogronp ()84. as well as the 
049:NM strain CB7690, obviously harbored functional pro- 
phages carrymg an nkA variant, which could be transduced in 
the E. coli K-12 strain C n00/pK 1 8 (data not shown). Strains 
CB7690, 4795/97, and G I -086 1 2 carried variant nleAlO or 



the l 



ible I 



tophi 



CB6389 possessed an intact prophage 
that is not disrupted by an IS element, and the similar variant 
nkA8-2 was located in the genome of this isolate as well. 
Moreover, each of these three ()M strains exhibited an intact 
Stxl-converting prophage. This also was the case for the 
OS4:H2 isolate" Cd>7i97.'However, it was not possible to trans- 
duce the variant nkA4 of this Isolate in the E. coli strain 
( i | k v \ > K i > s i v;.v. or a variant ol 

nkA and originating front the other eight mx, -positive or s/.v- 
negalive strains were deiectabie ailcr transduclion in 0600/ 
pKIh. Thus, we were able io demonstrate ihc iransduction ot 
frvenlcAuvs >t ^ o th ic t ethod desctJ: ed above 
^ s o; , not evftde >. i , i tin i s i i Lit j lie- 
other 19 analyzed E. coli strains also harbor inducible nkA 
phages that could not be detected in this assay. 

DISCUSSION 

The detection of different variants of the gene nkA in 150 
out of 170 E. coli strains examined shows the widespread oc- 
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TABLI 3 £ U donor strains used f ,a ^ „ 1 et) j'i njks ul tsdt v d _u it it 



strain 




Host 


Disease" 


six type 


eo» type 


nleA tvp-.r 






026:1111 


Human 


HUS 




|3 


Z6024 * nfe^S-i 




.U";9 


026:1111 


Human 


NK 


2 


P 


n/e/aS-I 


s\<jc 


( ■./■■.,■>■■ 




Human 


HC 










CB7!97 


0\4:il2 




NK 










■ ■ ' t- 


(!.S4:il2 


Human 


D 












Oolhbi 


Human 


D 


1 






six!, nleA^s 


OI4i8oi2 


i)S4:ii2S 


Human 


NK 










CBtSW. 


OS4:H- 


Pig 


NK 










dir. C-;p 




Human 


D 








stx lt nleA9 


UTT 




Human 


UTT 














Human 


HliS 






iw/-iike 


SIX, 


1639/77 




Human 


D 












i)i i 


Human 


HUS 


1/2 








T4/97 




Pigeon 




2f 


P 


a-^lifcc'' 


s;xZ 


0917/99 


Oi 15:1128 


Human 














ouvins 


Human 


HUS 






nfe/ta 




i B49 i 


0 145:1-128 


Human 


HUS 






»i;e.46~i 


slxl, nkA6-l 


i >:.m t 


0145:H2S 


Pig 


AS 




7 
P 


m'e/1/; 






Oi45:NM 


Human 


HUS 


2 


/00.7.4 


S!X 2 




Oi45:NM 


Human 


HUS 


1/2 








CB8104 


0145:NM 


Human 


HUS 


2 








LTEC94460 




Human 












PT272 






NK 










2492/00 


0157:H" 


Human 


HUS 


2 




Z6024 





' N K, not I 

Variant gs-m- wilh a )-bp iMoston of villtvr the /60.J-I or »•/.<, !S-.' srqucno.-. 
I !■. i o.n ■ >! i x w k i i i " 'Ins g^.x 



currence of this non-LEE-encodcd 'OSS effector among 
pathogenic^, coli strains. With the exception of two strains, we 
could confirm the appearance oinleA in association with eae as 
determined by Mundy et ai. (26). It has yet to be proven 
<v!](.'!iti av !» l vr is es an able to secrete NleA 
or if the gene represents a relic of extensive genetic rearrange- 
ment without any known [unci ion. Moreover, although the 
function of the virulence determinant NleA is unknown, the 
widespread distribution of the encoding gene points to an 



DNA 



f dtffei 



ecim 



icids 



nled in the 



middle of the encoded deduced 
protein >\ ito i some variants. . s region, character- 
ized by ihe | . I occurrence of alanine, serine, id threo- 
nine, includes a putdn e u nr ni^nhaun.. hchs. i hod' me. m vin- 
in! (iit with i de'eiiost of 'M, i > SI amino acids, this helix s 
missing, thus, these variants exhibit onh one putative transmem- 
brane helix, whereas other t e I s of the t 1 gene family 
possess two helices. Al present, the role of the number of helices 



one or two different nleA variants, whereas less important 
serogroups contain a larger number o! variants (Table 2). the 
majority of 026, O103, Ol 11, and 0157 strains harbor Z6024, 
the espl-tike gene, or nleAS-1. On the other hand, strains of 
serogroups 049, 084, and 0156 harbor a variety of members 
of the nleA gene family. Therefore, these strains may be de- 
picting a pool for genetic rearrangements. 

Whereas Z6024 is harbored by Ihe cryptic prophage CP- 
9S3P | 14V, ;/•!.. and lite w 7 ;/-Iiki gene arc carrk d by in- 
ducible bacteriopti e 1 - 'i 24) I insductii i xperiments also 
revealed the location ol* nkAh-1, nkA9, and uleAlO in the 
genome of inducible phages. 4 3SS effector protein-encoding 
genes often are present al one end of bacteriophages, presum- 
ably a result of incorrect excision during the lytic Hie cycle. 
Moreovc k msdu i ol bade i s sbai cany a 
ant of the gene nleA to a laboratory E. coli strain raises the 
possibility that NleA-converting bacteriophages can be spread 



of tl 



y ho 



;opie 



by E11TC or TPTC. Such an asvx : u a >n a ;-o v. as shown for Stx 
variants. Stx2 often causes more severe disease than those 
caused by Stxl, whereas differences appear among Ihe heter- 
ogeneous members of the Stx2 group (2, 12). Until now, no 
significant correlation coo Id be determined between the occur- 
rence of a certain nleA variant and the appearance, of a specific 



nleA gene family to infect the E. 
another lysogenic phage in the ge: 
were destroyed by genetic rearrang 
cate a tnajot t ki err .phage 
members of the nleA gene family. 



ne of this E. coli strain 
ents. These results indi- 
a the distrttmtion of the 
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,. Hue 



pathogenic E. coli strains are located on mobile genetic ele- 
ments. To elucidate the role of the phage-encoded type 111 
effectors in more detail, further rescaixfi is needed. 
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Appendix B: Hideo et al. Full Translation 
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3 5 CLAIMS 

1. A nucleic-acid molecule specific to enterohemorrhagic 
pathogenic- E. coli 0-15?:H7. 

2. The nucleic-acid molecule of claim 1, which is a 
nucleic-acid molecule specific to enterohemorrhagic 

40 pathogenic-E. coli Ol57:H7 and has 

(a) a nucleotide sequence selected from a group 
comprising the following SEQ IDs: SEQ ID NO:l, SEQ ID NO: 
132, SEQ ID NO: 244, SEQ ID NO: 337, SEQ ID NO: 410, SEQ ID 
NO: 484, SEQ ID NO: 554, SEQ ID NO: 630, SEQ ID NO : 689, 

45 SEQ ID NO: 755, SEQ ID NO : 816, SEQ ID NO : 876, SEQ ID NO: 
927, SEQID NO: 978, SEQ ID NO: 1013, SEQ ID NO: 1029, SEQ 
ID NO: 1055, SEQ ID NO: 1060, SEQ ID NO: 1093, SEQ ID NO: 
1128, SEQ ID NO: 115 7, SEQ. ID NO: 1191, SEQ ID NO: 1212, 
SEQ ID NO: 1240, SEQ ID NO: 1258, SEQ ID NO: 1274, SEQ ID 

50 NO: 1288, SEQ ID NO: 1302, SEQ ID NO: 1309, SEQ. ID NO: 1321, 
SEQID NO: 1329, SEQ ID NO: 1338, SEQ ID NO: 1.348, SEQ ID 
NO: 1359, SEQ ID NO: 1366, SEQ ID NO: 1374, SEQ ID NO: 1380, 
SEQ ID NO: 1386, SEQ ID NO: 1394, SEQ ID NO: 1401, SEQ ID 
NO: 1408, SEQ ID NO: 1411, SEQ ID NO: 1418, SEQ ID NO: 1426, 

55 SEQ ID NO: 1436, SEQ ID NO: 1443, SEQ ID NO: 1450, SEQID 
NO: 1457, SEQ ID NO: 1460, SEQ ID NO: 1467, SEQ ID NO: 1471, 
SEQ IDNO: 1473, SEQ ID NO: 1478, SEQ ID NO: 1487, SEQ ID 
NO: 1489, SEQ ID NO: 1494, SEQ ID NO: 1499, SEQ, ID NO: 
1501, SEQ ID NO: 1506, SEQ ID NO: 1508, SEQ ID NO: 1510, 

60 SEQ ID NO: 1511, SEQ ID NO: 1516, SEQ. ID NO: 1520, SEQ ID 
NO: 1526, SEQ. ID NO: 1532, SEQ ID NO: 1537, SEQ ID NO: 1540, 
SEQ ID NO: 1545, SEQ ID NO: 1547, SEQ ID NO: 1549, SEQ ID 
NO: 1551, SEQ ID NO: 1553, SEQ ID NO: 1555, SEQ. ID NO: 1558, 
SEQ ID NO: 1563, SEQ ID NO : 1566, SEQ ID NO: 1569, SEQ ID 

65 NO: 1571, SEQ ID NO: 1576, SEQ ID NO: 1580, SEQ ID NO: 1584, 
SEQ ID NO: 1587, SEQ ID NO: 1591, SEQ. ID NO: 1594, SEQID 
NO: 1596, SEQ. ID NO: 1599, SEQ. ID NO: 1601, SEQ ID NO: 1603, 
SEQ ID NO: 1604, SEQ ID NO: 1605, SEQ. ID NO: 1607, SEQ ID 
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NO : 1612, SEQ ID NO: 1615, SEQ ID NO : 1617, SEQ ID NO : 1619, 

7 0 S E Q I D NO: 1622, S E Q I D NO: 1624, S E Q [ D N 0:1626, 8 E Q I D 
NO: 1627, SEQ ID NO : 162 9, SEQ ID NO : 1 632, SEQID NO: 1635, 
SEQ ID NO: 1636, SEQ ID NO: 1637, SEQ ID NO: 1639, SEQ 
IDNO:1640, SEQ ID NO: 1643, SEQ ID NO : 1646, SEQ ID NO: 
1649, SEQ ID NO: 1652, SEQ ID NO : 1655, SEQ ID NO: 1658, 

75 SEQ ID NO: 1660, SEQ ID NO: 1662, SEQ ID NO: 1664, SEQ ID 
NO: 1666, SEQ ID NO: 1668, SEQ ID NO: 1669, SEQ ID NO: 1670, 
SEQ ID NO: 1672, SEQ ID NO : 1673, SEQ ID NO: 1675, SEQ 
IDNO: 1677, SEQ ID NO: 1680, SEQ ID NO: 1682, SEQ ID NO: 
1683, SEQ ID NO: 1685, SEQ. ID NO : 1688, SEQ ID NO: 1690, 

80 SEQ ID NO: 1691, SEQ ID NO: 1694, SEQ ID NO: 1696, SEQ ID 
NO: 1699, SEQ ID NO: 1700, SEQ ID NO: 1701, SEQ ID NO: 1704, 
SEQ ID NO: 1705, SEQ ID NO: 1706, SEQ ID NO: 1707, SEQID 
NO: 1708, SEQ ID NO: 1709, SEQ ID NO: 1710, SEQ ID NO: 1711, 
SEQ ID NO: 1712, SEQ ID NO: 1713, SEQ ID NO: 1715, SEQ ID 

85 NO: 1716, SEQ ID NO: 1717, SEQ ID NO: 1718,, SEQ ID NO: 
.1.719, SEQ ID NO: 1720, SEQ ID NO: 1721, SEQ ID NO: 1722, 
SEQ ID NO: 1723, SEQ ID NO: 1724, SEQ ID NO: 1725, SEQ ID 
NO: 1726, SEQ ID NO : 1727, SEQ ID NO: 1728, SEQ ID NO: 1729, 
SEQ IDNO: 1730, SEQ ID NO: 1731, SEQ ID NO: 1732, SEQ ID 

90 NO: 1733, SEQ ID NO: 1734, SEQ ID NO : 1 735, SEQ ID NO: 1736, 
SEQ ID NO: 1737, SEQ ID NO: 1738, SEQ ID NO: 1739, SEQ ID 
NO: 1740, SEQ ID NO: 1741, SEQ ID NO: 1742, SEQ ID NO: 1743, 
SEQ ID NO : 1744, SEQ ID NO: 1745, SEQ ID NO: 1746, SEQID 
NO: 1747, SEQ ID NO: 1748, SEQ ID NO: 1749, SEQ ID NO: 1750, 

95 SEQ ID NO: 1751, SEQ ID NO: 1752, SEQ ID NO: 1753, SEQ ID 
NO: 1754, SEQ ID NO: 1755, SEQ ID NO: 1756, SEQ ID NO: 1757, 
SEQ ID NO: 1758, SEQ ID NO: 1759, SEQ ID NO: 1760, SEQ ID 
NO: 1761, SEQ ID NO: 1762, SEQ ID NO: 1763, SEQID NO: 1764, 
SEQ ID NO: 1765, SEQ. ID NO: 1766, SEQ. ID NO: 1767, SEQ 
100 IDNO: 1768, SEQ. ID NO: 1769, SEQ ID NO: 1770, SEQ ID NO: 
1771, SEQ ID NO: 1772, SEQ. ID NO: 1773, SEQ ID NO: 1774, 
SEQ ID NO: 1775, SEQ ID NO: 1776, SEQ. ID NO: 1777, SEQ ID 
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NO: 1778, SEQ ID NO : 1779, SEQ ID NO: 1780, SEQ ID NO: 1781, 
SEQ ID NO: 1782, SEQ ID NO: 1783, SEQ ID NO: 1784, SEQ 

105 IDNO: 1785, SEQ. ID NO: 1786, SEQ ID NO: 1787, SEQ ID NO: 
1788, SEQ ID NO: 1789, SEQ. ID NO: 1790, SEQ ID NO: 1791, 
SEQ ID NO: 1792, SEQ ID NO : 1793, SEQ ID NO: 1794, SEQ ID 
NO: 1795, SEQ ID NO: 1796, SEQ ID NO: 1797, SEQ ID NO: 1798, 
SEQ ID NO: 1799, SEQ ID NO: 1800, SEQ ID NO: 1801, SEQID 

110 NO: 1802, SEQ ID NO: 1803, SEQ ID NO: 1804, SEQ ID NO: 1805, 
SEQ ID NO: 1806, SEQ ID NO: 1807, SEQ ID NO: 1808, SEQ ID 
NO: 1809, SEQ ID NO: 1810, SEQ ID NO: 1811, SEQ ID NO: 1812, 
SEQ ID NO: 1813, SEQ ID NO: 1814, SEQ ID NO: 1815, SEQ ID 
NO: 1816, SEQ ID NO: 1817, SEQ ID NO: 1818, SEQID NO: 1819, 

115 SEQ ID NO: 1820, SEQ ID NO: 1821, SEQ ID NO: 1822, SEQ 
IDNO: 1823, SEQ ID NO: 1824, SEQ ID NO: 1825, SEQ ID NO: 
1826, SEQ ID NO: 1827, SEQ ID NO: 1828, SEQ. ID NO: 1829, 
SEQ ID NO: 1830, SEQ ID NO: 1831, SEQ ID NO: 1832, SEQ ID 
NO: 1833, SEQ ID NO: 1834, SEQ ID NO: 1835, SEQ ID NO: 1836, 

120 SEQ ID NO: 1837, SEQ ID NO: 1838, SEQ ID NO: 1839, SEQ 
IDNO: 1840, SEQ ID NO: 1841, SEQ ID NO: 1842, SEQ ID NO: 
1843, SEQ ID NO: 1844, SEQ. ID NO: 1845, SEQ ID NO: 1846, 
SEQ ID NO: 1847, SEQ ID NO: 1848, SEQ ID NO: 1849, SEQ ID 
NO: 1850, SEQ ID NO: 1851, SEQ ID NO : 1852, SEQ ID NO: 1853, 

125 SEQ ID NO: 1854, SEQ ID NO: 1855, SEQ ID NO: 1856, SEQID 
NO: 1857, SEQ ID NO: 1858, SEQ ID NO: 1859, SEQ. ID NO: 1860, 
SEQ ID NO: 1861, SEQ ID NO: 1862, SEQ ID NO: 1863, SEQ ID 
NO: 1864, SEQ. ID NO: 1865, and SEQ ID NO: 1866 

(b) a moiety in the nucleotide sequences set forth in (a); 

130 (c) a complementary nucleotide sequence to the 

nucleotide sequences set forth in (a) or (b); or 

(d) a nucleotide sequence hybridizing to the nucleotide 
sequences set forth in (a), (b) or (c) under a stringent condition. 
3. The nucleic- acid molecule of claim 1, which is a 

135 nucleic-acid molecule encoding a polypeptide specific to 
enter ©hemorrhagic pathogenie-E. coli 0 157: II 7 and encodes 
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(a) an amino acid sequence selected from a group 
comprising the following SEQ IDs or a moiety thereof: SEQ ID 
NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 
140 6, SEQ ID NO: 7, SEQ. ID NO : 8, SEQ. ID NO : 9, SEQ ID NO: 10, 
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SEQ 
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12, SEQ ID NO 
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ID 


NO 


14, 


SEQ 


ID NO 


15, 


SEQ 
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16, SEQ ID NO 


17, SEQ 


I D 


NO 


18, 


SEQ 


ID NO 


19, 


SEQ 


ID NO 


:20, SEQID NO 


21, SEQ, 


ID 


NO 


22, 


SEQ 


ID NO 


23, 


SEQ 


ID NO 


:24, SEQ ID NO 


: 25, SEQ 


ID 


NO 


26, 


SEQ 


ID NO 


27, 


SEQ 


ID NO 


28, SEQ ID NO 


29, SEQ 


ID 


NO 


30, 


SEQ 


ID NO 


31, 


SEQ 


ID NO 


32, SEQ ID NO 


33, SEQ 


ID 


NO 
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SEQ 


ID NO 
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SEQ. 


ID NO 


36, SEQ ID NO 


37, SEQ 


ID 


NO 


38, 


SEQ 


ID NO 


39, 


SEQ 


IDNO: 


40, SEQ ID NO 


41, SEQ 


ID 


NO 


42, 


SEQ 


ID NO 


43, 


SEQ 


ID NO 


:44, SEQID NO 


4 5, SEQ 


ID 


NO 


46, 


SEQ 


ID NO 


47, 


SEQ 


ID NO 


:48, SEQ ID NO 


: 49, S EQ 


ID 


NO 


50, 


SEQ 


ID NO 


51, 


SEQ 


ID NO 


52, SEQ ID NO 


53, SEQ 


ID 


NO 


54, 


SEQ 


ID NO 


55, 


SEQ 


ID NO 


56, SEQ ID NO 


57, SEQ 


ID 


NO 


58, 


SEQ 


ID NO 


59, 


SEQ 


ID NO 


60, SEQ. ID NO 


61, SEQ 


ID 


NO 


62, 


SEQ 


ID NO 


63, 


SEQ 


ID NO 


64, SEQ ID NO 


65, SEQ 


ID 


NO 


66, 


SEQ 


ID NO 


67, 


SEQ 


ID NO 


68, SEQ ID NO 


69, SEQ 


ID 


NO 


70, 


SEQ 


ID NO 


71, 


SEQ 


ID NO 


: 72, SEQ. ID NO 


: 73, SEQ 


ID 


NO 


74, 


SEQ 


ID NO 


75, 


SEQ. 


ID NO 


76, SEQ ID NO 


77. SEQ 


ID 


NO 


78, 


SEQ 


ID NO 


79, 


SEQ 


ID NO 


80, SEQ ID NO 


81, SEQ 


ID 


NO 


82, 


SEQ 


ID NO 


83, 


SEQ 


ID NO 


84, SEQ. ID NO 


85, SEQ 


ID 


NO 


86, 


SEQ 


ID NO 


87, 


SEQ 


IDNO: 


88, SEQ. ID NO 


89, SEQ 


ID 


NO 


90, 


SEQ 


ID NO 


91, 


SEQ 


ID NO 


:92, SEQID NO 


93, SEQ 


ID 


NO 


94, 


SEQ 


ID NO 


95, 


SEQ 


ID NO 


:96, SEQ ID NO 


: 97, SEQ 


ID 


NO 


98, 


SEQ 


ID NO 


99, 


SEQ 


ID NO 


: 100, SEQ ID NO: 101, SEQ 


ID NO: 



102, SEQ ID NO: 103, SEQ ID NO: 104, SEQ ID NO: 105, SEQ ID 
165 NO : 106, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 109, 
SEQ ID NO : 110, SEQ. ID NO: 111, SEQ ID NO: 112, SEQ ID NO: 
113, SEQ ID NO: 114, SEQ ID NO : 115, SEQ ID NO: 116, SEQ ID 
NO: 117, SEQ ID NO: 118, SEQ ID NO: 119, SEQ ID NO: 120, 
SEQ ID NO: 121, SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID NO: 
170 124, SEQ. ID NO: 125, SEQ. ID NO: 126, SEQ ID NO: 127, SEQ ID 
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NO: 128, SEQ ID NO: 129, SEQ ID NO: 130, SEQ ID NO: 131. 
SEQ ID NO: 133, SEQ ID NO : 134, SEQ ID NO: 135, SEQ ID NO: 
136, SEQ ID NO: 137, SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID 
NO: 140, SEQ ID NO: 141, SEQ. ID NO: 142, SEQ ID NO: 143, 

175 SEQ ID NO : 144, SEQ ID NO: 145, SEQ ID NO: 146, SEQ ID NO: 
147, SEQ ID NO: 14 8, SEQ ID NO: 149, SEQ ID NO: 150, SEQ ID 
NO: 151, SEQ ID NO: 152, SEQ ID NO: 153, SEQ ID NO: 154, 
SEQ ID NO: 155, SEQ ID NO: 156, SEQ ID NO: 157, SEQ ID NO: 
158, SEQ ID NO: 159, SEQ ID NO: 160, SEQ ID NO: 161, SEQ ID 

180 NO: 162, SEQ ID NO : 163, SEQ ID NO : 164, SEQ ID NO: 165, 
SEQ ID NO : 166. SEQ ID NO : 167, SEQ ID NO: 168, SEQ ID NO: 
169, SEQ ID NO: 170, SEQ ID NO: 171, SEQ ID NO: 172, SEQ ID 
NO: 173, SEQ ID NO: 174, SEQ ID NO: 175, SEQ ID NO: 176, 
SEQ ID NO: 177, SEQ ID NO: 178, SEQ ID NO: 179, SEQ ID NO: 

185 180, SEQ ID NO : 181, SEQ ID NO: 182, SEQ ID NO: 183, SEQ ID 
NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO : 187, 
SEQ ID NO: 188, SEQ ID NO : 189, SEQ ID NO: 190, SEQ ID NO: 
191, SEQ ID NO: 192, SEQ ID NO: 193, SEQ ID NO: 194, SEQ ID 
NO: 195, SEQ ID NO: 196, SEQ ID NO: 197, SEQ ID NO: 198, 

190 SEQ ID XO: 199. SEQ ID NO: 201), SEQ ID NO:201, SEQ ID NO: 
202, SEQ ID NO: 203, SEQ ID NO: 204, SEQ ID NO: 205, SEQ ID 
NO: 206, SEQ ID NO: 207, SEQ ID NO : 208, SEQ ID NO : 209, 
SEQ ID NO: 2 10, SEQ ID NO:211, SEQ ID NO:212, SEQ ID NO: 
213, SEQ ID NO: 21 I. SEQ ID NO:215, SEQ ID NO:216, SEQ ID 

195 NO: 217, SEQ ID NO : 218, SEQ ID NO: 219, SEQ ID NO: 220, 
SEQ ID NO:221, SEQ ID NO: 222, SEQ ID NO: 223, SEQ ID NO: 
224, SEQ ID NO: 225, SEQ ID NO : 226, SEQ ID NO: 227, SEQ ID 
NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 231, 
SEQ ID NO: 232, SEQ ID NO: 233, SEQ ID NO: 234, SEQ ID NO: 

200 235, SEQ ID NO: 236, SEQ ID NO: 237, SEQ ID NO: 238, SEQ ID 
NO: 239, SEQ ID NO: 240, SEQ ID NO: 241, SEQ ID NO: 242, 
SEQ ID NO: 243, SEQ ID NO: 245, SEQ ID NO: 246, SEQ ID NO: 
247, SEQ. ID NO: 248, SEQ. ID NO: 249, SEQ ID NO: 250, SEQ ID 
NO: 251, SEQ ID NO: 252, SEQ. ID NO: 253, SEQ ID NO: 254, 
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205 SEQ ID NO: 255, SEQ ID NO : 256, SEQ ID NO: 257, SEQ ID NO: 
258, SEQ ID NO: 259, SEQ ID NO: 260, SEQ ID NO: 261, SEQ ID 
NO: 262, SEQ ID NO: 263, SEQ ID NO: 264, SEQ ID NO : 265, 
SEQ ID NO : 266. SEQ ID NO: 267, SEQ ID NO: 268, SEQ ID NO: 
269, SEQ ID NO: 270, SEQ ID NO: 271, SEQ ID NO: 272, SEQ ID 

210 NO: 273, SEQ ID NO: 274, SEQ ID NO: 275, SEQ ID NO: 276, 
SEQ ID NO: 277, SEQ ID NO: 278, SEQ ID NO: 279, SEQ ID NO: 
280, SEQ ID NO: 281, SEQ ID NO:282, SEQ ID NO: 283, SEQ ID 
NO: 284, SEQ ID NO: 285, SEQ ID NO: 286, SEQ ID NO: 287, 
SEQ ID NO: 288, SEQ ID NO: 289, SEQ ID NO: 290, SEQ ID NO: 

215 291, SEQ ID NO: 292, SEQ. ID NO: 293, SEQ ID NO: 294, SEQ ID 
NO: 295, SEQ ID NO: 296, SEQ ID NO: 297, SEQ ID NO: 298, 
SEQ ID NO: 299, SEQ ID NO: 300, SEQ ID NO: 301, SEQ ID NO: 
302, SEQ ID NO: 303, SEQ ID NO: 304, SEQ ID NO: 305, SEQ ID 
NO: 306, SEQ ID NO : 307, SEQ ID NO : 308, SEQ ID NO : 309, 

220 SEQ ID NO: 310, SEQ ID NO: 311, SEQ ID NO: 312, SEQ ID NO: 
313, SEQ ID NO: 314, SEQ ID NO:315, SEQ ID NO:316, SEQ ID 
NO: 317, SEQ ID NO: 318, SEQ ID NO: 319, SEQ ID NO : 320, 
SEQ ID NO : 321, SEQ ID NO: 322, SEQ ID NO: 323, SEQ ID NO: 
324, SEQ ID NO: 325, SEQ ID NO: 326, SEQ ID NO: 327, SEQ ID 

225 NO: 328, SEQ ID NO: 329, SEQ ID NO: 330, SEQ ID NO: 331, 
SEQ ID NO: 332, SEQ ID NO: 333, SEQ ID NO : 334, SEQ ID NO: 
335, SEQ ID NO: 336, SEQ ID NO: 338, SEQ ID NO: 339, SEQ ID 
NO: 340, SEQ ID NO: 341, SEQ ID NO: 342, SEQ ID NO: 343, 
SEQ ID NO: 344, SEQ ID NO : 34 5, SEQ ID NO: 346, SEQ ID NO: 

230 347, SEQ ID NO : 348, SEQ ID NO: 349, SEQ ID NO: 350, SEQ ID 
NO: 351, SEQ ID NO: 352, SEQ ID NO: 353, SEQ ID NO : 354, 
SEQ ID NO: 355, SEQ ID NO: 356, SEQ ID NO: 357, SEQ ID NO: 
358, SEQ ID NO: 359, SEQ ID NO : 360, SEQ ID NO: 361, SEQ ID 
NO: 362, SEQ ID NO: 363, SEQ ID NO : 364, SEQ ID NO : 365, 

235 SEQ ID NO: 366, SEQ. ID NO: 367, SEQ ID NO : 368, SEQ ID NO: 
369, SEQ ID NO: 370, SEQ. ID NO: 371, SEQ. ID NO: 372, SEQ ID 
NO: 373, SEQ ID NO : 374, SEQ. ID NO: 375, SEQ ID NO: 376, 
SEQ ID NO: 377, SEQ ID NO: 378, SEQ ID NO: 379, SEQ ID NO: 
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380, SEQ ID NO: 381, SEQ ID NO: 382, SEQ ID NO: 383, SEQ ID 

240 NO: 384, SEQ ID NO: 385, SEQ ID NO : 386, SEQ ID NO : 387, 
SEQ ID NO: 388, SEQ ID NO: 389, SEQ ID NO: 390, SEQ ID NO: 
391, SEQ. ID NO : 392, SEQ. ID NO: 393, SEQ ID NO: 394, SEQ ID 
NO: 395, SEQ ID NO: 396, SEQ ID NO: 397, SEQ ID NO : 398, 
SEQ ID NO: 399, SEQ ID NO: 400, SEQ ID NO: 401, SEQ ID NO: 

245 402, SEQ ID NO: 403, SEQ ID NO: 404, SEQ ID NO: 405, SEQ ID 
NO: 406, SEQ ID NO: 407, SEQ ID NO : 408, SEQ ID NO: 409, 
SEQ ID NO: 411, SEQ ID NO : 412, SEQ ID NO: 413, SEQ ID NO: 
414, SEQ ID NO : 415, SEQ ID NO : 416, SEQ ID NO: 417, SEQ ID 
NO: 418, SEQ ID NO: 419, SEQ. ID NO: 420, SEQ ID NO: 421, 

250 SEQ ID NO:422, SEQ ID NO:423, SEQ ID NO:424, SEQ ID NO: 
425, SEQ ID NO: 426, SEQ ID NO: 427, SEQ ID NO: 428, SEQ ID 
NO: 429, SEQ ID NO: 430, SEQ ID NO: 431, SEQ ID NO: 432, 
SEQ ID NO: 433, SEQ, ID NO: 434, SEQ, ID NO : 435, SEQ ID NO: 
436, SEQ ID NO : 43 7, SEQ ID NO: 438, SEQ ID NO: 439, SEQ ID 

25 5 NO: 44 0, SEQ ID NO: 441, SEQ ID NO: 442, SEQ ID NO : 443, 
SEQ ID NO: 444, SEQ ID NO: 445, SEQ ID NO: 446, SEQ ID NO: 
447, SEQ, ID NO: 448, SEQ. ID NO: 449, SEQ. ID NO: 450, SEQ ID 
NO: 451, SEQ ID NO: 452, SEQ. ID NO: 453, SEQ ID NO : 454, 
SEQ ID NO: 455, SEQ ID NO : 456, SEQ ID NO: 457, SEQ ID NO: 

260 4 58, S E Q I D N 0 : 4 59, SEQ I D N 0 : 4 6 0 , S E Q I D N 0 : 4 6 1 , S E Q I D 
NO: 462, SEQ ID NO : 463, SEQ ID NO : 464, SEQ ID NO : 465, 
SEQ ID NO: 466, SEQ. ID NO: 467, SEQ. ID NO: 468, SEQ ID NO: 
469, SEQ ID NO: 470, SEQ ID NO:471, SEQ ID NO: 472, SEQ ID 
NO: 473, SEQ ID NO: 474, SEQ ID NO: 475, SEQ ID NO : 476, 

265 SEQ ID NO: 477, SEQ ID NO: 478, SEQ ID NO: 479, SEQ ID NO: 
480, SEQ ID NO: 481, SEQ ID NO: 482, SEQ ID NO: 483, SEQ ID 
NO: 485, SEQ ID NO: 486, SEQ ID NO: 487, SEQ ID NO : 488, 
SEQ ID NO: 489, SEQ. ID NO: 490, SEQ ID NO: 491, SEQ ID NO: 
492, SEQ ID NO: 493, SEQ ID NO: 494, SEQ ID NO: 495, SEQ ID 

270 NO: 496, SEQ ID NO : 497, SEQ. ID NO : 498, SEQ ID NO : 499, 
SEQ ID NO: 500, SEQ ID NO: 501, SEQ ID NO: 502, SEQ ID NO: 
503, SEQ. ID NO: 504, SEQ. ID NO: 505, SEQ ID NO: 506, SEQ ID 
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NO: 507, SEQ ID NO:. 508, SEQ ID NO: 509, SEQ ID NO: 510, 
SEQ ID NO: 511, SEQ ID NO: 512, SEQ ID NO : 513, SEQ ID NO: 

275 514, SEQ ID NO: 515, SEQ ID NO: 516, SEQ ID NO: 51 7, SEQ ID 
NO: 518, SEQ ID NO: 519, SEQ. ID NO: 520, SEQ ID NO: 521, 
SEQ ID N 0:522, SEQ ID NO: 523, SEQ ID NO : 524, SEQ ID NO: 
525, SEQ ID NO: 526, SEQ ID NO: 527, SEQ ID NO: 528, SEQ ID 
NO: 529, SEQ ID NO: 530, SEQ ID NO: 531, SEQ ID NO: 532, 

280 SEQ ID NO: 533, SEQ ID NO : 534, SEQ ID NO: 535, SEQ ID 
NO: 536, SEQ ID NO: 537, SEQ ID NO: 538, SEQ ID N 0:539, 
SEQ ID NO: 540, SEQ ID NO: 541, SEQ ID NO: 542, SEQ ID NO: 
543, SEQ ID NO: 544, SEQ. ID NO: 545, SEQ ID NO: 546, SEQ ID 
NO: 547, SEQ ID NO : 548, SEQ. ID NO: 549, SEQ ID NO : 550, 

285 SEQ ID NO: 551, SEQ ID NO: 552, SEQ ID NO: 553, SEQ ID NO: 
555, SEQ ID NO: 556, SEQ ID NO: 557, SEQ ID NO: 558, SEQ ID 
NO: 559, SEQ ID NO : 560, SEQ ID NO: 561, SEQ ID NO : 562. 
SEQ ID NO: 563, SEQ ID NO: 564, SEQ ID NO: 565, SEQ ID NO: 
566, SEQ ID NO: 567, SEQ ID NO: 568, SEQ ID NO: 569, SEQ ID 

290 NO: 570, SEQ ID N 0:571, SEQ ID NO: 572, SEQ ID NO: 573, 
SEQ ID NO: 574, SEQ ID NO: 575, SEQ ID NO: 576, SEQ ID NO: 
577, SEQ ID NO: 578, SEQ. ID NO: 579, SEQ. ID NO: 580, SEQ ID 
NO: 581, SEQ ID NO: 582, SEQ ID NO: 583, SEQ ID NO: 584, 
SEQ ID NO: 585, SEQ ID NO: 586, SEQ ID NO: 587, SEQ ID NO: 

295 588, SEQ ID NO : 589, SEQ ID NO: 590, SEQ ID NO: 591, SEQ ID 
NO: 592, SEQ ID NO : 593, SEQ ID NO : 594, SEQ ID NO: 595, 
SEQ ID NO: 596, SEQ ID NO : 597, SEQ ID NO: 598, SEQ ID NO: 
599, SEQ. ID NO: 600, SEQ. ID NO: 601, SEQ ID NO: 602, SEQ ID 
NO: 603, SEQ ID NO: 604, SEQ ID NO: 605, SEQ ID NO : 606, 

300 SEQ ID NO: 607, SEQ ID NO: 608, SEQ ID NO: 609, SEQ ID NO: 
610, SEQ ID NO: 611, SEQ ID N 0:612, SEQ ID NO: 613, SEQ ID 
NO: 614, SEQ ID NO: 615, SEQ ID NO: 616, SEQ ID NO: 617, 
SEQ ID NO: 618, SEQ. ID NO: 619, SEQ ID NO: 620, SEQ ID NO: 
621, SEQ ID NO: 622, SEQ. ID NO: 623, SEQ ID NO: 624, SEQ ID 

305 NO: 625, SEQ ID NO : 626, SEQ ID NO: 627, SEQ ID NO: 628, 
SEQ ID NO: 629, SEQ ID NO: 631, SEQ ID NO: 632, SEQ ID NO: 
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633, SEQ ID NO: 634, SEQ ID NO: 635, SEQ ID NO: 636, SEQ ID 
NO: 637, SEQ ID NO: 638, SEQ ID NO: 639, SEQ ID NO : 640, 
SEQ ID NO: 641, SEQ ID NO: 642, SEQ ID NO: 643, SEQ ID NO: 

310 644, SEQ ID NO : 645, SEQ. ID NO: 646, SEQ ID NO: 647, SEQ ID 
NO: 648, SEQ ID NO: 649, SEQ ID NO: 650, SEQ ID NO: 651, 
SEQ ID NO: 652, SEQ ID NO : 653, SEQ ID NO : 654, SEQ ID NO: 
655, SEQ ID NO: 656, SEQ ID NO: 657, SEQ ID NO: 658, SEQ ID 
NO: 659, SEQ ID NO : 660, SEQ ID NO: 661, SEQ ID NO: 662, 

315 S E Q I D NO: 663, S E Q I D N 0 : 6 6 4 , S E Q I D N 0 : 6 6 5 , S E Q I D N 0 : 
666, SEQ ID NO: 667, SEQ ID NO : 668, SEQ ID NO: 669, SEQ ID 
NO: 670, SEQ ID NO: 671, SEQ. ID NO: 672, SEQ ID NO : 673, 
SEQ ID NO: 674, SEQ ID NO: 675, SEQ ID NO: 676, SEQ ID NO: 
677, SEQ ID NO: 678, SEQ ID NO: 679, SEQ ID NO: 680, SEQ ID 

320 NO: 681, SEQ ID NO: 682, SEQ ID NO: 683, SEQ ID NO : 684, 
SEQ ID NO: 685, SEQ ID NO: 686, SEQ ID NO: 687, SEQ ID NO: 
688, SEQ ID NO: 690, SEQ ID NO: 691, SEQ ID NO: 692, SEQ ID 
NO: 693, SEQ ID NO: 694, SEQ ID NO: 695, SEQ ID NO : 696, 
SEQ ID NO: 697, SEQ ID NO: 698, SEQ ID NO: 699, SEQ ID NO: 

325 700, SEQ. ID NO: 701, SEQ ID NO: 702, SEQ ID NO: 703, SEQ ID 
NO: 704, SEQ ID NO: 705, SEQ. ID NO: 706, SEQ ID NO: 707, 
SEQ ID NO: 708, SEQ ID NO: 709, SEQ ID NO:710, SEQ ID NO: 
711, SEQ ID NO: 712, SEQ ID NO : 7 13, SEQ ID NO: 714, SEQ ID 
NO: 715, SEQ ID NO: 716, SEQ ID NO: 717, SEQ ID NO: 718, 

330 SEQ ID NO: 719, SEQ. ID NO : 720, SEQ. ID NO: 721, SEQ ID NO: 
722, SEQ ID NO: 723, SEQ ID NO: 724, SEQ ID NO: 725, SEQ ID 
NO: 726, SEQ ID NO: 727, SEQ. ID NO: 728, SEQ ID NO: 729, 
SEQ ID NO: 730, SEQ ID NO: 731, SEQ ID NO: 732, SEQ ID NO: 
733, SEQ ID NO: 734, SEQ ID NO: 735, SEQ ID NO: 736, SEQ ID 

335 NO: 737, SEQ ID NO: 738, SEQ ID NO: 739, SEQ ID NO : 740, 
SEQ ID NO: 741, SEQ. ID NO: 742, SEQ. ID NO: 743, SEQ ID NO: 
744, SEQ ID NO: 745, SEQ ID NO:746, SEQ ID NO:747, SEQ ID 
NO: 748, SEQ ID NO : 749, SEQ. ID NO: 750, SEQ ID NO: 751, 
SEQ ID NO: 752, SEQ ID NO: 753, SEQ ID NO : 754, SEQ ID NO: 

340 756, SEQ. ID NO: 757, SEQ. ID NO: 758, SEQ ID NO: 759, SEQ ID 



Appendix B: Hideo et at. Full Translation 

NO: 760, SEQ ID NO: 761, SEQ ID NO: 762, SEQ ID NO: 763, 
SEQ ID NO: 764, SEQ ID NO: 765, SEQ ID NO: 766, SEQ ID NO: 
767, SEQ ID NO: 768, SEQ ID NO: 769, SEQ ID NO: 770, SEQ ID 
NO: 771, SEQ ID NO: 772, SEQ. ID NO: 773, SEQ ID NO: 774, 

345 SEQ ID NO: 775, SEQ ID NO: 776, SEQ ID NO: 777, SEQ ID NO: 
778, SEQ ID NO: 779, SEQ ID NO: 780, SEQ ID NO: 781, SEQ ID 
NO: 782, SEQ ID NO : 783, SEQ ID NO : 784, SEQ ID NO : 785, 
SEQ ID NO: 786, SEQ ID NO: 787, SEQ ID NO : 788, SEQ ID NO: 
789, SEQ ID NO: 790, SEQ ID NO: 791, SEQ ID NO: 792, SEQ ID 

350 NO: 793, SEQ ID NO: 794, SEQ ID NO: 795, SEQ ID NO : 796, 
SEQ ID NO: 797, SEQ ID NO : 798. SEQ ID NO: 799, SEQ ID N 
0:800, SEQ ID NO: 801, SEQ ID NO: 802, SEQ ID NO: 803, SEQ 
ID NO: 804, SEQ ID NO: 805, SEQ ID NO : 806, SEQ ID NO: 807, 
SEQ ID NO: 808, SEQ ID NO: 809, SEQ ID NO: 810, SEQ ID NO: 

355 811, SEQ ID NO: 812, SEQ ID NO: 813, SEQ ID NO: 814, SEQ ID 
NO: 815, SEQ ID NO: 817, SEQ ID NO: 818, SEQ ID NO: 819, 
SEQ ID NO: 820, SEQ ID NO : 821, SEQ ID NO: 822, SEQ ID NO: 
823, SEQ ID NO: 824, SEQ ID NO: 825, SEQ ID NO: 826, SEQ ID 
NO: 827, SEQ ID NO: 828, SEQ. ID NO: 829, SEQ ID NO : 830, 

360 SEQ ID NO:831, SEQ ID NO:832, SEQ ID NO: 833, SEQ ID NO: 
834, SEQ ID NO: 835, SEQ ID NO: 836, SEQ ID NO: 837, SEQ ID 
NO: 838, SEQ ID NO: 839, SEQ ID NO: 840, SEQ ID NO: 841, 
SEQ ID NO: 842, SEQ. ID NO : 843, SEQ. ID NO : 844, SEQ ID NO: 
845, SEQ ID NO: 846, SEQ ID NO: 847, SEQ ID NO: 848, SEQ ID 

365 NO: 849, SEQ ID NO: 850, SEQ ID NO: 851, SEQ ID NO: 852, 
SEQ ID NO: 853, SEQ ID NO: 854, SEQ ID NO: 855, SEQ ID NO: 
856, SEQ. ID NO: 857, SEQ ID NO: 858, SEQ ID NO: 859, SEQ ID 
NO: 860, SEQ ID NO : 861, SEQ ID NO: 862, SEQ ID NO : 863, 
SEQ ID NO: 864, SEQ. ID NO: 865, SEQ ID NO : 866, SEQ ID NO: 

370 867, SEQ ID NO : 868, SEQ ID NO: 869, SEQ ID NO: 870, SEQ ID 
NO: 871, SEQ ID NO: 872, SEQ ID NO: 873, SEQ ID NO : 874, 
SEQ ID NO: 875, SEQ ID NO: 877, SEQ ID NO: 878, SEQ ID NO: 
879, SEQ. ID NO: 880, SEQ. ID NO: 881, SEQ. ID NO: 882, SEQ ID 
NO: 883, SEQ ID NO : 884, SEQ. ID NO: 885, SEQ ID NO : 886, 
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375 SEQ ID NO: 887, SEQ ID NO 1888, SEQ ID NO: 889, SEQ ID NO: 
890, SEQ ID NO: 891, SEQ ID NO:892, SEQ ID NO: 893, SEQ ID 
NO: 894, SEQ ID NO : 895, SEQ ID NO : 896, SEQ ID NO: 897, 
SEQ ID NO: 898, SEQ ID NO: 899, SEQ ID NO: 900, SEQ ID NO: 
901, SEQ ID NO: 902, SEQ ID NO: 903, SEQ ID NO: 904, SEQ ID 

380 NO: 905, SEQ ID NO: 906, SEQ ID NO: 907, SEQ ID NO : 908, 
SEQ ID NO: 909, SEQ ID NO : 910, SEQ. ID NO: 911, SEQ ID NO: 
912, SEQ ID NO : 913, SEQ ID NO: 914, SEQ ID NO: 915, SEQ ID 
NO: 916, SEQ ID NO : 917, SEQ ID NO: 918, SEQ ID NO : 919, 
SEQ ID NO: 920, SEQ ID NO: 921, SEQ ID NO: 922, SEQ ID NO: 

385 923, SEQ ID NO: 924, SEQ. ID NO: 925, SEQ, ID NO: 926, SEQ ID 
NO: 928, SEQ ID NO: 929, SEQ ID NO: 930, SEQ ID NO: 931, 
SEQ ID NO: 932, SEQ ID NO: 933, SEQ ID NO: 934, SEQ ID NO: 
935, SEQ ID NO: 936, SEQ ID NO: 93 7, SEQ ID NO: 938, SEQ ID 
NO: 939, SEQ ID NO : 940, SEQ ID NO: 941, SEQ ID NO: 942, 

390 SEQ ID NO: 943, SEQ ID NO: 944, SEQ ID NO: 945, SEQ ID NO: 
946, SEQ ID NO: 947, SEQ ID NO: 948, SEQ ID NO: 949, SEQ ID 
NO: 950, SEQ ID NO: 951, SEQ ID NO: 952, SEQ ID NO : 953, 
SEQ ID NO: 954, SEQ ID NO: 955, SEQ ID NO : 956, SEQ ID NO: 
957, SEQ ID NO: 958, SEQ. ID NO: 959, SEQ ID NO: 960, SEQ ID 

395 NO: 961, SEQ ID NO: 962, SEQ ID NO: 963, SEQ ID NO: 964, 
SEQ ID NO: 965, SEQ ID NO: 966, SEQ ID NO: 967, SEQ ID NO: 
968, SEQ ID NO : 969, SEQ ID NO: 970, SEQ ID NO: 971, SEQ ID 
NO: 972, SEQ ID NO : 973, SEQ ID NO: 974, SEQ ID NO: 975, 
SEQ ID NO: 976, SEQ ID NO : 977, SEQ ID NO: 979, SEQ ID NO: 

400 980, SEQ. ID NO: 981, SEQ. ID NO : 982, SEQ ID NO: 983, SEQ ID 
NO: 984, SEQ ID NO: 985, SEQ ID NO: 986, SEQ ID NO : 987, 
SEQ ID NO: 988, SEQ ID NO: 989, SEQ ID NO: 990, SEQ ID NO: 
991, SEQ ID NO: 992, SEQ ID NO: 993, SEQ ID NO: 994, SEQ ID 
NO: 995, SEQ ID NO: 996, SEQ ID NO: 997, SEQ ID NO : 998, 

405 SEQ ID NO: 999, SEQ ID NO: 1000, SEQ ID NO: 1001, SEQ. ID 
NO: 1002, SEQ. ID NO: 1003, SEQ. ID NO: 1004, SEQ ID NO: 1005, 
SEQID NO: 1006, SEQ ID NO: 1007, SEQ. ID NO: 1008, SEQ. ID 
NO: 1009, SEQ IDNO: 1010, SEQ ID NO: 1011, SEQ ID NO: 1012, 
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SEQ ID NO: 1014, SEQ ID NO: 1015, SEQ ID NO: 1016, SEQ ID 

410 NO: 1017, SEQ ID NO : 101 8, SEQ ID NO: 1019, SEQ ID NO: 1020, 
SEQ ID NO: 1021, SEQ ID NO: 1022, SEQ ID NO: 1023, SEQ ID 
NO: 1024, SEQ. ID NO: 1025, SEQ. ID NO: 1026, SEQ ID NO: 1027, 
SEQ IDNO: 1028, SEQ ID NO: 1030, SEQ ID NO: 1031, SEQ ID 
NO: 1032, SEQ ID NO : 1033, SEQ ID NO: 1034, SEQ. ID NO: 1035, 

415 SEQ ID NO: 1036, SEQ ID NO: 1037, SEQ ID NO: 1038, SEQ ID 
NO: 1039, SEQ ID NO: 1040, SEQ ID NO: 1041, SEQ ID NO: 1042, 
SEQ ID NO: 1043, SEQ ID NO : 1044, SEQ ID NO: 1045, SEQID 
NO: 1046, SEQ ID NO: 1047, SEQ ID NO: 1048, SEQ ID NO: 1049, 
SEQ ID NO: 1050, SEQ. ID NO: 1051, SEQ. ID NO: 1052, SEQ ID 

420 NO: 1053, SEQ ID NO: 1054, SEQ ID NO: 1056, SEQ ID NO: 1057, 
SEQ ID NO: 1058, SEQ ID NO: 1059, SEQ ID NO: 1061, SEQ ID 
NO: 1062, SEQ ID NO: 1063, SEQ ID NO: 1064, SEQID NO: 1065, 
SEQ ID NO: 1066, SEQ. ID NO : 1067, SEQ ID NO : 1068, SEQ 
IDNO: 1069, SEQ ID NO: 1070, SEQ ID NO: 1071, SEQ. ID NO: 

425 1072, SEQ ID NO: 1073, SEQ ID NO: 1074, SEQ ID NO: 1075, 
SEQ ID NO: 1076, SEQ ID NO : 1077, SEQ ID NO: 1078, SEQ ID 
NO: 1079, SEQ ID NO : 1080, SEQ ID NO: 1081, SEQ ID NO: 1082, 
SEQ ID NO : 1083, SEQ ID NO: 1084, SEQ ID NO: 1085, SEQ 
I D NO: 1086, SEQ ID NO: 1087, SEQ ID NO: 1088, SEQ ID NO: 

430 1089, SEQ ID NO: 1090, SEQ ID NO: 1091, SEQ. ID NO: 1092, 
SEQ ID NO: 1094, SEQ ID NO: 1095, SEQ ID NO: 1096, SEQ ID 
NO: 1097, SEQ ID NO: 1098, SEQ ID NO: 1099, SEQ. ID NO: 1100, 
SEQ ID NO: 1101, SEQ ID NO : 1102, SEQ ID NO: 1103, SEQID 
NO: 1104, SEQ. ID NO: 1105, SEQ ID NO: 1106, SEQ ID NO: 1107, 

435 SEQ ID NO: 1108, SEQ ID NO: 1109, SEQ. ID NO: 1110, SEQ ID 
NO: 1111, SEQ ID NO : 1112, SEQ ID NO: 1113, SEQ ID NO: 1114, 
S E Q I D N 0 : 1 1 1 5 , SEQ I D N O : 1116, S E Q I D N 0 : 1 117, S E Q I D 
NO: 1118, SEQ ID NO: 1119, SEQ ID NO: 1120, SEQID NO: 1121, 
SEQ ID NO : 1122, SEQ ID NO : 1123, SEQ ID NO : 1124, SEQ 

440 IDNO: 1125, SEQ. ID NO: 1126, SEQ ID NO: 1127, SEQ ID NO: 
1129, SEQ. ID NO: 1130, SEQ ID NO: 1131, SEQ ID NO: 1132, 
SEQ ID NO: 1133, SEQ ID NO: 1134, SEQ. ID NO: 1135, SEQ ID 
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NO: 1136, SEQ ID NO: 1137, SEQ ID NO: 1138, SEQ ID NO: 1139, 
SEQ ID NO: 1140, SEQ ID NO : 1141, SEQ ID NO : 1142, SEQ 

445 IDNO: 1143, SEQ ID NO: 1144, SEQ ID NO: 1145, SEQ ID NO: 
1146, SEQ ID NO: 1147, SEQ ID NO : 1148, SEQ ID NO: 1149, 
SEQ ID NO: 1150, SEQ ID NO: 1151, SEQ ID NO : 1152, SEQ ID 
NO: 1153, SEQ ID NO: 1154, SEQ ID NO : 11 55, SEQ ID NO: 1156, 
SEQ ID NO: 1158, SEQ ID NO: 1159, SEQ ID NO: 1160, SEQID 

450 NO: 1161, SEQ ID NO: 1162, SEQ ID NO: 1163, SEQ ID NO: 1164, 
SEQ ID NO: 1165, SEQ ID NO: 1166, SEQ ID NO: 1167, SEQ ID 
NO: 1168, SEQ ID NO: 1169, SEQ ID NO: 1170, SEQ ID NO: 117 1 , 
SEQ ID NO: 1172, SEQ ID NO: 1173, SEQ ID NO: 1174, SEQ ID 
NO: 1175, SEQ ID NO: 1176, SEQ ID NO: 1177, SEQID NO: 1178, 

455 SEQ ID NO: 1179, SEQ ID NO: 1180, SEQ ID NO : 1181, SEQ 
IDNO: 1182, SEQ ID NO: 1183, SEQ ID NO: 1184, SEQ ID NO: 
1185, SEQ ID NO: 1186, SEQ ID NO: 1187, SEQ ID NO: 1188, 
SEQ ID NO: 1189, SEQ ID NO: 1190, SEQ ID NO: 1192, SEQ ID 
NO: 1193, SEQ ID NO : 11 94, SEQ ID NO: 1195, SEQ ID NO: 1196, 

460 SEQ ID NO : 1197, SEQ ID NO : 1198, SEQ ID NO : 1199, SEQ 
IDNO: 1200, SEQ ID NO: 1201, SEQ ID NO: 1202, SEQ ID NO: 
1203, SEQ ID NO: 1204, SEQ ID NO: 1205, SEQ ID NO: 1206, 
SEQ ID NO: 1207, SEQ ID NO: 1208, SEQ ID NO: 1209, SEQ ID 
N 0 : 1 210, S E Q I D N 0 : 121 1 , S E Q I D N 0 : 1 2 1 3 , S E Q ID NO: 12 1 4 , 

465 SEQ ID NO: 1215, SEQ ID NO: 1216, SEQ ID NO: 1217, SEQID 
NO: 1218, SEQ ID NO: 1219, SEQ ID NO: 1220, SEQ ID NO: 1221, 
SEQ ID NO: 1222, SEQ ID NO: 1223, SEQ ID NO: 1224, SEQ ID 
NO: 1225, SEQ. ID NO: 1226, SEQ ID NO: 1227, SEQ ID NO: 1228, 
SEQ ID NO: 1229, SEQ ID NO: 1230, SEQ ID NO: 1231, SEQ ID 

470 NO : 1232, SEQ ID NO : 1233, SEQ ID NO : 1234, SEQID NO : 1235, 
SEQ ID NO: 1236, SEQ ID NO: 1237, SEQ ID NO : 1238, SEQ 
IDNO: 1239, SEQ ID NO: 1241, SEQ ID NO: 1242, SEQ. ID NO: 
1243, SEQ ID NO: 1244, SEQ ID NO: 1245, SEQ. ID NO: 1246, 
SEQ ID NO: 1247, SEQ ID NO: 1248, SEQ ID NO: 1249, SEQ ID 

475 NO: 1250, SEQ. ID NO: 1251, SEQ. ID NO: 1252, SEQ ID NO: 1253, 
SEQ ID NO: 1254, SEQ ID NO: 1255, SEQ ID NO : 1256, SEQ 
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IDNO: 1257, SEQ ID NO: 1259, SEQ ID NO : 1260, SEQ ID NO: 
1261, SEQ ID NO: 1262, SEQ ID NO : 1263, SEQ ID NO: 1264, 
SEQ ID NO: 1265, SEQ ID NO: 1266, SEQ ID NO: 1267, SEQ ID 

480 NO: 1268, SEQ. ID NO: 1269, SEQ. ID NO: 1270, SEQ ID NO: 1271, 
SEQ ID NO: 1272, SEQ ID NO: 1273, SEQ ID NO: 12 75, SEQID 
NO: 1276, SEQ ID NO: 1277, SEQ ID NO: 1278, SEQ. ID NO: 1279, 
SEQ ID NO: 1280, SEQ ID NO: 1281, SEQ ID NO: 1282, SEQ ID 
NO: 1283, SEQ ID NO: 1284, SEQ ID NO: 1285, SEQ. ID NO: 1286, 

485 SEQ ID NO: 1287, SEQ ID NO: 1289, SEQ ID NO: 1290, SEQ ID 
NO: 1291, SEQ ID NO: 1292, SEQ ID NO: 1293, SEQID NO: 1294, 
SEQ ID NO: 1295, SEQ ID NO: 1296, SEQ ID NO: 1297, SEQ 
IDNO: 1298, SEQ ID NO: 1299, SEQ ID NO: 1300, SEQ ID NO: 
1301, SEQ ID NO: 1303, SEQ ID NO : 1304, SEQ ID NO: 1305, 

490 SEQ ID NO: 1306, SEQ ID NO: 1307, SEQ ID NO: 1308, SEQ ID 
NO: 1310, SEQ ID NO: 1311, SEQ. ID NO: 1312, SEQ. ID NO: 1313, 
SEQ ID NO: 1314, SEQ ID NO: 1315, SEQ. ID NO: 1316, SEQ 
IDNO: 1317, SEQ ID NO: 1318, SEQ ID NO: 1319, SEQ, ID NO: 
1320, SEQ ID NO: 1322, SEQ ID NO: 1323, SEQ ID NO: 1324, 

495 SEQ ID NO: 1325, SEQ. ID NO: 1326, SEQ ID NO: 1327, SEQ ID 
NO: 1328, SEQ ID NO : 1330, SEQ ID NO: 1331, SEQ ID NO: 1332, 
SEQ ID NO: 1333, SEQ ID NO: 1334, SEQ ID NO: 1335, SEQID 
NO: 1336, SEQ ID NO: 1337, SEQ ID NO : 1339, SEQ. ID NO: 1340, 
SEQ ID NO: 1341, SEQ ID NO: 1342, SEQ ID NO: 1343, SEQ ID 

500 NO: 1344, SEQ ID NO: 1345, SEQ ID NO: 1346, SEQ. ID NO: 1347, 
SEQ ID NO: 1349, SEQ ID NO: 1350, SEQ ID NO: 1351, SEQ ID 
NO: 1352, SEQ ID NO: 1353, SEQ ID NO: 1354, SEQID NO: 1355, 
SEQ ID NO: 1356, SEQ ID NO: 1357, SEQ ID NO: 1358, SEQ 
IDNO: 1360, SEQ ID NO : 1361, SEQ ID NO: 1362, SEQ ID NO: 

505 1363, SEQ ID NO: 1364, SEQ ID NO: 1365, SEQ. ID NO: 1367, 
SEQ ID NO: 1368, SEQ ID NO: 1369, SEQ ID NO: 1370, SEQ ID 
NO: 1371, SEQ ID NO: 1375, SEQ ID NO: 1376, SEQ. ID NO: 1377, 
SEQ ID NO: 1378, SEQ ID NO: 1379, SEQ ID NO: 1381, SEQ 
IDNO: 1382, SEQ. ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 

510 1385, SEQ ID NO: 1387, SEQ. ID NO: 1388, SEQ ID NO: 1389, 
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SEQ ID NO: 1390, SEQ ID NO: 1391, SEQ ID NO: 1392, SEQ ID 
NO: 1393, SEQ ID NO: 1395, SEQ ID NO: 1396, SEQ ID NO: 1397, 
SEQ ID NO: 1398, SEQ ID NO: 1399, SEQ ID NO: 1400, SEQID 
NO: 1402, SEQ. ID NO: 1403, SEQ. ID NO: 1404, SEQ ID NO: 1405, 

515 SEQ ID NO: 1406, SEQ ID NO: 1407, SEQ ID NO: 1409, SEQ ID 
NO: 1410, SEQ ID NO: 1412, SEQ ID NO: 1413, SEQ. ID NO: 1414, 
SEQ ID NO: 1415, SEQ ID NO: 1416, SEQ ID NO: 1417, SEQ ID 
NO: 1419, SEQ ID NO: 1420, SEQ ID NO: 1421, SEQID NO: 1422, 
SEQ ID NO: 1423, SEQ ID NO : 142 4, SEQ ID NO: 1425, SEQ 

520 IDNO: 1427, SEQ ID NO : 1428, SEQ ID NO: 1429, SEQ ID NO: 
1430, SEQ ID NO: 1431, SEQ. ID NO: 1432, SEQ ID NO: 1433, 
SEQ ID NO: 1434, SEQ ID NO: 1435, SEQ ID NO: 1437, SEQ ID 
NO: 1438, SEQ ID NO: 1439, SEQ ID NO: 1440, SEQ ID NO: 1441, 
SEQ ID NO: 1442, SEQ ID NO: 1444, SEQ ID NO: 1445, SEQ 

525 IDNO: 1446, SEQ ID NO : 1447, SEQ ID NO: 1448, SEQ. ID NO: 
1449, SEQ ID NO: 1451, SEQ ID NO : 1452, SEQ. ID NO: 1453, 
SEQ ID NO: 1454, SEQ ID NO: 1455, SEQ ID NO: 1456, SEQ ID 
NO: 1458, SEQ ID NO: 1459, SEQ ID NO: 1461, SEQ ID NO: 1462, 
SEQ ID NO: 1463, SEQ ID NO: 1464, SEQ ID NO: 1465, SEQID 

530 NO: 1466, SEQ. ID NO: 1468, SEQ ID NO: 1469, SEQ ID NO: 1470, 
SEQ ID NO: 1472, SEQ ID NO: 1474, SEQ ID NO: 1475, SEQ ID 
NO: 1476, SEQ ID NO: 1477, SEQ ID NO: 1479, SEQ. ID NO: 1480, 
SEQ ID NO: 1481, SEQ ID NO: 1482, SEQ ID NO : 1483, SEQ ID 
NO: 1484, SEQ ID NO: 1485, SEQ ID NO: 1486, SEQID NO: 1488, 

535 SEQ ID NO: 1490, SEQ ID NO : 1491, SEQ ID NO: 1492, SEQ 
IDNO: 1493, SEQ. ID NO: 1495, SEQ ID NO: 1496, SEQ ID NO: 
1497, SEQ ID NO: 1498, SEQ ID NO: 1500, SEQ ID NO: 1502, 
SEQ ID NO: 1503, SEQ ID NO: 1504, SEQ ID NO: 1505, SEQ ID 
NO: 1507, SEQ ID NO: 1509, SEQ ID NO: 1512, SEQ. ID NO: 1513, 

540 SEQ ID NO : 1514, SEQ. ID NO: 1515, SEQ. ID NO: 1517, SEQ. 
IDNO: 1518, SEQ ID NO: 1519, SEQ. ID NO: 1521, SEQ. ID NO: 
1522, SEQ ID NO: 1523, SEQ ID NO: 1524, SEQ ID NO: 1525, 
SEQ ID NO: 1527, SEQ ID NO: 1528, SEQ. ID NO: 1529, SEQ ID 
NO: 1530, SEQ. ID NO: 1531, SEQ ID NO: 1533, SEQ ID NO: 1534, 



Appendix B: Hideo et at. Full Translation 

545 SEQ ID NO: 1535, SEQ ID NO : 1536. SEQ ID NO: 1538, SEQID 
NO: 1539, SEQ ID NO: 1541, SEQ ID NO: 1542, SEQ ID NO: 1543, 
SEQ ID NO: 1544, SEQ ID NO: 1546, SEQ ID NO: 1548, SEQ ID 
NO: 1550, SEQ. ID NO: 1552, SEQ. ID NO: 1554, SEQ ID NO: 1556, 
SEQ ID NO: 1557, SEQ ID NO: 1559, SEQ ID NO: 1560, SEQ ID 

550 NO : 1561, SEQ ID NO : 1562, SEQ ID NO : 1564, SEQID NO : 1565, 
SEQ ID NO: 1567, SEQ ID NO: 1568, SEQ. ID NO: 1570, SEQ. 
IDNO: 1572, SEQ ID NO: 1573, SEQ ID NO: 1574, SEQ. ID NO: 
1575, SEQ ID NO: 1577, SEQ ID NO : 1578, SEQ ID NO: 1579, 
SEQ ID NO: 1581, SEQ ID NO: 1582, SEQ ID NO: 1583, SEQ ID 

555 NO: 1585, SEQ. ID NO: 1586, SEQ ID NO: 1588, SEQ ID NO: 1589, 
SEQ ID NO: 1590, SEQ ID NO: 1592, SEQ ID NO: 1593, SEQ 
IDNO : 1595, SEQ ID NO: 1597, SEQ ID NO: 1598, SEQ ID NO: 
1600, SEQ ID NO: 1602, SEQ ID NO: 1606, SEQ ID NO: 1608, 
SEQ ID NO: 1609, SEQ ID NO: 1610, SEQ ID NO: 1611, SEQ ID 

560 NO: 1613, SEQ ID NO: 1614, SEQ ID NO: 1616, SEQ ID NO: 1618, 
SEQ ID NO: 1620, SEQ ID NO : 1621, SEQ ID NO : 1623, SEQID 
NO: 1625, SEQ ID NO: 1628, SEQ ID NO: 1630, SEQ ID NO: 1631, 
SEQ ID NO: 1633, SEQ. ID NO: 1634, SEQ ID NO: 1638, SEQ ID 
NO: 1641, SEQ ID NO: 1642, SEQ ID NO: 1644, SEQ ID NO: 1645, 

565 SEQ ID NO: 1647, SEQ ID NO: 1648, SEQ ID NO: 1650, SEQ ID 
NO: 1651, SEQ ID NO: 1653, SEQ ID NO: 1654, SEQID NO: 1656, 
SEQ ID NO : 1657, SEQ ID NO: 1659, SEQ. ID NO: 1661, SEQ. 
IDNO: 1663, SEQ ID NO: 1665, SEQ ID NO: 1667, SEQ. ID NO: 
1671, SEQ ID NO: 1674, SEQ ID NO: 1676, SEQ ID NO: 1678, 

570 SEQ ID NO: 1679, SEQ. ID NO: 1681, SEQ. ID NO: 1684, SEQ ID 
NO: 1686, SEQ. ID NO : 1687, SEQ ID NO: 1689, SEQ ID NO: 1692, 
SEQ ID NO: 1693, SEQ ID NO: 1695, SEQ ID NO: 1697, SEQ 
IDNO: 1698, SEQ ID NO: 1702, and SEQ ID NO: 1703 

, or (b) a polypeptide comprising an amino acid sequences 

575 in the amino acid sequences set forth in (a) in which several 
amino acids are deleted, replaced or added. 

4. A polypeptide specific to enterohemorrhagic pathogenic- E . 
coli 0-157:117. 
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5. The polypeptide of claim 4 comprising 
580 (a) an amino acid sequence selected from a group 

comprising the following SEQ. IDs or a moiety thereof: SEQ ID 
NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ IDNO:5, SEQ ID NO: 
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SEQ 


ID 


NO 


47, 


SEQ ID 


NO 


: 48, SEQ 


ID 


NO 


49, 


SEQ 


ID NO 


50, 
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ID 
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I D 


NO 
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SEQ ID 


KG 


100, SEQ ID NO: 101, SEQ ID NO: 



102, SEQ ID NO: 103, SEQ ID NO: 104, SEQ ID NO: 105, SEQ ID 
NO: 106, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 109, 
SEQ ID NO: 110, SEQ. ID NO: 111, SEQ ID NO: 112, SEQ ID NO: 
610 .113, SEQ ID NO: 114, SEQ ID NO: 115, SEQ ID NO : 1 16, SEQ ID 
NO: 117, SEQ ID NO: 118, SEQ ID NO: 119, SEQ. ID NO: 120, 
SEQ ID NO: 121, SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID NO: 
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124, SEQ ID NO: 125, SEQ ID NO: 126, SEQ ID NO: 127, SEQ ID 
NO: 128, SEQ ID NO: 129, SEQ ID NO: 130, SEQ ID NO: 131, 

615 SEQ ID NO: 133, SEQ ID NO: 134, SEQ ID NO: 135, SEQ ID NO: 
136, SEQ. ID NO: 137, SEQ. ID NO: 138, SEQ ID NO: 139, SEQ ID 
NO: 140, SEQ ID NO: 141, SEQ ID NO: 142, SEQ ID NO: 143, 
SEQ ID NO: 144, SEQ ID NO: 145, SEQ ID NO: 146, SEQ ID NO: 
147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, SEQ ID 

620 NO: 151, SEQ ID NO: 152, SEQ ID NO: 153, SEQ ID NO: 154, 
SEQ ID NO: 155, SEQ ID NO : 156, SEQ ID NO: 157, SEQ ID NO: 
158, SEQ ID NO: 159, SEQ ID NO: 160, SEQ ID NO: 161, SEQ ID 
NO: 162, SEQ ID NO: 163, SEQ. ID NO: 164, SEQ ID NO: 165, 
SEQ ID NO : 166. SEQ ID NO: 167, SEQ ID NO: 168, SEQ ID NO: 

625 169, SEQ ID NO: 170, SEQ ID NO: 171, SEQ ID NO: 172, SEQ ID 
NO : 173, SEQ ID NO: 174, SEQ ID NO: 175, SEQ ID NO: 176, 
SEQ ID NO: 177, SEQ, ID NO: 178, SEQ, ID NO: 179, SEQ ID NO: 
180, SEQ ID NO: 181, SEQ ID NO: 182, SEQ ID NO: 183, SEQ ID 
NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO: 187, 

630 SEQ ID NO: 188, SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID NO: 
191, SEQ, ID NO: 192, SEQ. ID NO : 193, SEQ. ID NO: 194, SEQ ID 
NO: 195, SEQ ID NO: 196, SEQ ID NO: 197, SEQ ID NO: 198, 
SEQ ID NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQ ID NO: 
202, SEQ ID NO: 203, SEQ ID NO: 2 04, SEQ ID NO: 205, SEQ ID 

635 NO: 206, SEQ ID NO : 207, SEQ ID NO : 208, SEQ ID NO : 209, 
SEQ ID NO: 210, SEQ. ID NO: 211, SEQ ID NO : 212, SEQ ID NO: 
213, SEQ ID NO: 214, SEQ ID NO:215, SEQ ID N'O: 216. SEQ ID 
NO: 217, SEQ ID NO: 218, SEQ. ID NO: 219, SEQ ID NO : 220, 
SEQ ID NO: 22 1. SEQ ID NO:222, SEQ ID NO:223, SEQ ID NO: 

640 224, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID NO: 227, SEQ ID 
NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 231, 
SEQ ID NO: 232, SEQ. ID NO: 233, SEQ. ID NO: 234, SEQ ID NO: 
235, SEQ ID NO: 236, SEQ ID NO:237, SEQ ID NO:238, SEQ ID 
NO: 239, SEQ ID NO: 240, SEQ ID NO: 241, SEQ ID NO: 242, 

645 SEQ ID NO: 243, SEQ ID NO: 245, SEQ ID NO: 246, SEQ ID NO: 
247, SEQ. ID NO: 248, SEQ ID NO: 249, SEQ ID NO: 250, SEQ ID 
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NO : 251, SEQ ID NO: 252, SEQ ID NO : 2 53, SEQ ID NO: 254, 
SEQ ID NO: 2 55, SEQ ID NO: 256, SEQ ID NO: 257, SEQ ID NO: 
258, SEQ ID NO: 259, SEQ ID NO: 260, SEQ ID NO: 261, SEQ ID 

650 NO: 262, SEQ ID NO: 263, SEQ ID NO: 264, SEQ ID NO: 265, 
SEQ ID NO: 266, SEQ ID NO: 267, SEQ ID NO: 268, SEQ ID NO: 
269, SEQ ID NO: 270, SEQ ID NO: 271, SEQ ID NO: 272, SEQ ID 
NO: 273, SEQ ID NO: 274, SEQ ID NO: 275, SEQ ID NO: 276, 
SEQ ID NO: 277, SEQ ID NO: 278, SEQ ID NO: 279, SEQ ID NO: 

655 280, SEQ ID NO: 281, SEQ ID NO: 282, SEQ ID NO: 283, SEQ ID 
NO: 284, SEQ ID NO: 285, SEQ ID NO: 286, SEQ ID NO: 287, 
SEQ ID NO: 288, SEQ ID NO: 289, SEQ ID NO: 290, SEQ ID NO: 
291, SEQ ID NO: 292, SEQ ID NO : 293, SEQ ID NO: 294, SEQ ID 
NO: 295, SEQ ID NO: 296, SEQ ID NO: 297, SEQ ID NO: 298, 

660 SEQ ID NO: 299, SEQ ID NO: 300, SEQ ID NO: 301, SEQ ID NO: 
302, SEQ ID NO: 303, SEQ ID NO: 304, SEQ ID NO: 305, SEQ ID 
NO: 306, SEQ ID NO : 307, SEQ ID NO : 308, SEQ ID NO: 309, 
SEQ ID NO: 310, SEQ ID NO : 311, SEQ ID N 0:312, SEQ ID NO: 
313, SEQ ID NO: 314, SEQ ID NO: 315, SEQ ID NO: 316, SEQ ID 

665 NO: 317, SEQ ID N 0:318, SEQ ID NO: 319, SEQ ID NO : 320, 
SEQ ID NO:321, SEQ ID NO:322, SEQ ID NO:323, SEQ ID NO: 
324, SEQ ID NO: 325, SEQ ID NO: 326, SEQ ID NO: 327, SEQ ID 
NO: 328, SEQ ID NO: 329, SEQ ID NO: 330, SEQ ID NO: 331, 
SEQ ID NO: 332, SEQ ID NO: 333, SEQ ID NO : 334, SEQ ID NO: 

670 335, SEQ ID NO : 336, SEQ ID NO: 338, SEQ ID NO: 339, SEQ ID 
NO: 340, SEQ ID NO : 341, SEQ ID NO: 342, SEQ ID NO: 343, 
SEQ ID NO: 344, SEQ ID NO: 345, SEQ ID NO : 346, SEQ ID NO: 
347, SEQ ID NO: 348, SEQ ID NO: 349, SEQ ID NO: 350, SEQ ID 
NO: 351, SEQ ID NO: 352, SEQ ID NO: 353, SEQ ID NO: 354, 

675 SEQ ID NO: 355, SEQ ID NO: 356, SEQ ID NO: 357, SEQ ID NO: 
358, SEQ ID NO: 359, SEQ ID NO: 360, SEQ ID NO: 361, SEQ ID 
NO: 362, SEQ ID NO : 363, SEQ ID NO : 364, SEQ ID NO: 365, 
SEQ ID NO: 366, SEQ ID NO: 367, SEQ ID NO: 368, SEQ ID NO: 
369, SEQ. ID NO: 370, SEQ. ID NO: 371, SEQ ID NO: 372, SEQ ID 

680 NO: 373, SEQ ID NO : 374, SEQ. ID NO: 375, SEQ ID NO: 376, 
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SEQ ID NO: 377, SEQ ID NO: 378, SEQ ID NO: 379, SEQ ID NO: 
380, SEQ ID NO: 381, SEQ ID NO: 382, SEQ ID NO: 383, SEQ ID 
NO: 384, SEQ ID NO : 385, SEQ ID NO : 386, SEQ ID NO : 387, 
SEQ ID NO: 388, SEQ ID NO: 389, SEQ ID NO: 390, SEQ ID NO: 

685 391, SEQ ID NO : 392, SEQ ID NO: 393, SEQ ID NO: 394, SEQ ID 
NO: 395, SEQ ID NO: 396, SEQ ID NO: 397, SEQ ID NO : 398, 
SEQ ID NO: 399, SEQ ID NO: 400, SEQ ID NO: 401, SEQ ID NO: 
402, SEQ ID NO: 403, SEQ ID NO: 404, SEQ ID NO: 405, SEQ ID 
NO: 406, SEQ ID NO: 407, SEQ ID NO : 408, SEQ ID NO : 409, 

690 S E Q 1 1) N 0 : 4 1 1 , S E Q I D N O : 4 12, S E Q I D N 0 : 4 1 3 , S E Q I D N 0 : 
414, SEQ ID NO: 415, SEQ. ID NO: 416, SEQ ID NO: 417, SEQ ID 
NO: 418, SEQ ID NO: 419, SEQ ID NO: 420, SEQ ID NO: 421, 
SEQ ID NO: 422, SEQ ID NO: 423, SEQ ID NO: 424, SEQ ID NO: 
425, SEQ ID NO: 426, SEQ ID NO: 427, SEQ ID NO: 428, SEQ ID 

695 NO: 429, SEQ ID NO: 430, SEQ ID NO: 431, SEQ ID NO : 432. 
SEQ ID NO: 433, SEQ ID NO: 434, SEQ ID NO: 435, SEQ ID NO: 
436, SEQ ID NO: 437, SEQ ID NO: 438, SEQ ID NO: 439, SEQ ID 
NO: 440, SEQ ID NO: 441, SEQ ID NO: 442, SEQ ID NO: 443, 
SEQ ID NO: 444, SEQ ID NO: 445, SEQ ID NO: 446, SEQ ID NO: 

700 447, SEQ ID NO: 448, SEQ ID NO: 449, SEQ ID NO: 450, SEQ ID 
NO: 451, SEQ ID NO: 452, SEQ ID NO: 453, SEQ ID NO : 454, 
SEQ ID NO: 455, SEQ ID NO: 456, SEQ ID NO: 457, SEQ ID NO: 
458, SEQ ID NO: 459, SEQ ID NO: 460, SEQ ID NO: 461, SEQ ID 
NO: 462, SEQ ID NO: 463, SEQ ID NO : 464, SEQ ID NO: 465, 

7 05 S E Q I D NO: 466, S E Q I D N 0 : 4 6 7 , S E Q I D N 0 : 4 68, S E Q I D N 0 : 
469, SEQ ID NO: 470, SEQ ID NO: 471, SEQ ID NO: 472, SEQ ID 
NO: 473, SEQ ID NO: 474, SEQ ID NO: 475, SEQ ID NO : 476, 
SEQ ID N 0:477, SEQ ID NO: 478, SEQ ID NO: 479, SEQ ID NO: 
480, SEQ ID NO: 481, SEQ ID NO: 482, SEQ ID NO: 483, SEQ ID 

710 NO: 485, SEQ ID NO: 486, SEQ ID NO: 487, SEQ ID NO : 488, 
SEQ ID NO: 489, SEQ. ID NO: 490, SEQ ID NO: 491, SEQ ID NO: 
492, SEQ ID NO: 493, SEQ. ID NO: 494, SEQ. ID NO: 495, SEQ ID 
NO: 496, SEQ ID NO: 497, SEQ. ID NO : 498, SEQ ID NO : 499, 
SEQ ID NO: 500, SEQ ID NO: 501, SEQ ID NO: 502, SEQ ID NO: 
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715 503, SEQ ID NO : 504, SEQ ID NO: 505, SEQ ID NO: 506, SEQ ID 
NO: 507, SEQ ID NO: 508, SEQ ID NO: 509, SEQ ID NO : 510, 
SEQ ID NO: 51.1, SEQ ID NO: 512, SEQ ID NO: 513, SEQ ID NO: 
514, SEQ. ID NO: 515, SEQ. ID NO: 516, SEQ ID NO: 517, SEQ ID 
NO: 518, SEQ ID NO: 519, SEQ ID NO: 520, SEQ ID NO: 521, 

720 SEQ ID N 0:522, SEQ ID NO: 523, SEQ ID NO : 524, SEQ ID NO: 
525, SEQ ID NO: 526, SEQ ID NO: 527, SEQ ID NO: 528, SEQ ID 
NO: 529, SEQ ID NO : 530, SEQ ID NO: 531, SEQ ID NO: 532, 
SEQ ID NO: 533, SEQ ID NO: 534, SEQ ID NO: 535, SEQ ID NO: 
536, SEQ ID NO: 537, SEQ ID NO: 538, SEQ ID NO: 539, SEQ ID 

725 NO: 540, SEQ ID NO: 541, SEQ. ID NO: 542, SEQ ID NO : 543, 
SEQ ID NO: 544, SEQ ID NO: 545, SEQ ID NO : 546, SEQ ID NO: 
547, SEQ ID NO: 548, SEQ ID NO: 549, SEQ ID NO: 550, SEQ ID 
NO: 551, SEQ ID NO: 552, SEQ ID NO: 553, SEQ ID NO: 555, 
SEQ ID NO: 556, SEQ, ID NO: 557, SEQ ID NO : 558, SEQ ID NO: 

730 559, SEQ ID NO : 560, SEQ ID NO: 561, SEQ ID NO: 562, SEQ ID 
NO: 563, SEQ ID NO: 564, SEQ ID NO: 565, SEQ ID NO : 566, 
SEQ ID NO: 567, SEQ ID NO: 568, SEQ ID NO: 569, SEQ ID NO: 
570, SEQ ID NO: 571, SEQ. ID NO: 572, SEQ ID NO: 573, SEQ ID 
NO: 574, SEQ ID NO: 575, SEQ ID NO: 576, SEQ ID NO: 577, 

735 SEQ ID NO: 578, SEQ ID NO : 579, SEQ ID NO: 580, SEQ ID NO: 
581, SEQ ID NO: 582, SEQ ID NO: 583, SEQ ID NO: 584, SEQ ID 
NO: 585, SEQ ID NO : 586, SEQ ID NO: 587, SEQ ID NO : 588, 
SEQ ID NO: 589, SEQ ID NO: 590, SEQ ID NO: 591, SEQ ID NO: 
592, SEQ ID NO: 593, SEQ ID NO: 594, SEQ ID NO: 595, SEQ ID 

740 NO: 596, SEQ ID NO: 597, SEQ ID NO : 598, SEQ ID NO : 599, 
SEQ ID NO: 600, SEQ ID NO: 601, SEQ ID NO: 602, SEQ ID NO: 
603, SEQ ID NO: 604, SEQ ID NO: 605, SEQ ID NO : 606, SEQ I 
D NO: 607, SEQ ID NO: 608, SEQ ID NO: 609, SEQ ID NO: 610, 
SEQ ID NO: 611, SEQ ID NO : 612, SEQ ID NO: 613, SEQ ID NO: 

745 614, SEQ ID NO: 615, SEQ ID NO:616, SEQ ID NO:617, SEQ ID 
NO: 618, SEQ ID NO: 619, SEQ ID NO: 620, SEQ ID NO: 621, 
SEQ ID NO: 622, SEQ ID NO: 623, SEQ ID NO : 624, SEQ ID NO: 
625, SEQ. ID NO: 626, SEQ. ID NO:627, SEQ ID NO:628, SEQ ID 
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NO: 629, SEQ ID NO: 631, SEQ ID NO: 632, SEQ ID NO : 633, 

750 SEQ ID NO: 634, SEQ ID NO: 635, SEQ ID NO: 636, SEQ ID NO: 
637, SEQ ID NO: 638, SEQ ID NO: 639, SEQ ID NO: 640, SEQ ID 
NO: 641, SEQ ID NO: 642, SEQ. ID NO: 643, SEQ ID NO : 644, 
SEQ ID NO: 645, SEQ ID NO: 646, SEQ ID NO: 647, SEQ ID NO: 
648, SEQ ID NO: 64 9, SEQ ID NO : 650, SEQ ID NO: 651, SEQ ID 

755 NO: 652, SEQ ID NO: 653, SEQ ID NO : 654, SEQ ID NO : 655, 
SEQ ID NO: 656, SEQ ID NO: 657, SEQ ID NO : 658, SEQ ID NO: 
659, SEQ ID NO: 660, SEQ ID NO: 661, SEQ ID NO: 662, SEQ ID 
NO: 663, SEQ ID NO: 664, SEQ ID NO: 665, SEQ ID NO : 666, 
SEQ ID NO: 667, SEQ ID NO : 668, SEQ ID NO: 669, SEQ ID NO: 

760 670, SEQ ID NO: 671, SEQ ID NO: 672, SEQ ID NO: 673, SEQ ID 
NO: 674, SEQ ID NO: 675, SEQ ID NO: 676, SEQ ID NO: 677, 
SEQ ID NO: 678, SEQ ID NO: 679, SEQ ID NO : 680, SEQ ID NO: 
681, SEQ ID NO : 682. SEQ ID NO: 683, SEQ ID NO: 684, SEQ ID 
NO: 685, SEQ ID NO : 686, SEQ ID NO: 687, SEQ ID NO : 688, 

7 6 5 S E Q I D NO: 690, S E Q I D N 0:691, S E Q I D NO : 692, S E Q I D N 0 : 
693, SEQ ID NO: 694, SEQ ID NO: 695, SEQ ID NO: 696, SEQ ID 
NO: 697, SEQ ID NO : 698, SEQ ID NO: 699, SEQ ID NO : 700, 
SEQ ID NO: 701, SEQ ID NO: 702, SEQ ID NO: 703, SEQ ID NO: 
704, SEQ ID NO: 705, SEQ ID NO: 706, SEQ ID NO: 707, SEQ ID 

770 NO: 708, SEQ ID NO: 709, SEQ ID NO : 710, SEQ ID NO: 711, 
SEQ ID NO: 712, SEQ ID NO : 713, SEQ ID NO: 714, SEQ ID NO: 
715, SEQ ID NO: 716, SEQ ID NO: 717, SEQ ID NO:718, SEQ ID 
NO: 719, SEQ ID NO: 720, SEQ ID NO: 721, SEQ ID NO: 722, 
SEQ ID NO: 723, SEQ ID NO: 724, SEQ ID NO: 725, SEQ ID NO: 

775 726, SEQ ID NO: 727, SEQ ID NO: 728, SEQ ID NO: 729, SEQ ID 
NO: 730, SEQ ID NO : 731, SEQ ID NO: 732, SEQ ID NO : 733, 
SEQ ID NO: 734, SEQ ID NO: 735, SEQ ID NO : 736, SEQ ID NO: 
737, SEQ ID NO: 738, SEQ ID NO: 739, SEQ ID NO: 740, SEQ ID 
NO: 741, SEQ ID NO: 742, SEQ ID NO: 743, SEQ ID NO : 744, 

780 SEQ ID NO: 745, SEQ ID NO: 746, SEQ ID NO: 747, SEQ ID NO: 
748, SEQ. ID NO: 749, SEQ. ID NO: 750, SEQ ID NO: 751, SEQ ID 
NO: 752, SEQ ID NO : 753, SEQ. ID NO : 754, SEQ ID NO : 756, 
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SEQ ID NO: 757, SEQ ID NO: 758, SEQ ID NO: 759, SEQ ID NO: 
760, SEQ ID NO: 761, SEQ ID NO: 762, SEQ ID NO: 763, SEQ ID 

785 NO: 764, SEQ ID NO : 765, SEQ ID NO : 766, SEQ ID NO: 767, 
SEQ ID NO: 768, SEQ ID NO: 769, SEQ ID NO: 770, SEQ ID NO: 
771, SEQ ID NO: 772, SEQ ID NO: 773, SEQ ID NO: 774, SEQ ID 
NO: 775, SEQ ID NO: 776, SEQ ID NO: 777, SEQ ID NO: 778, 
SEQ ID NO: 779, SEQ ID NO: 780, SEQ ID NO: 781, SEQ ID NO: 

790 782, SEQ ID NO: 783, SEQ ID NO: 784, SEQ ID NO: 785, SEQ ID 
NO: 786, SEQ ID NO: 787, SEQ ID NO: 788, SEQ ID NO: 789, 
SEQ ID NO: 790, SEQ ID NO: 791, SEQ ID NO: 792, SEQ ID NO: 
793, SEQ ID NO: 794, SEQ. ID NO: 795, SEQ ID NO: 796, SEQ ID 
NO: 797, SEQ ID NO: 798, SEQ ID NO: 799, SEQ ID NO : 800, 

795 SEQ ID NO: 801, SEQ ID NO: 802, SEQ ID NO: 803, SEQ ID NO: 
804, SEQ ID NO: 805, SEQ ID NO: 806, SEQ ID NO: 807, SEQ ID 
NO: 808, SEQ ID NO : 809, SEQ ID NO : 810. SEQ ID NO: 811, 
SEQ ID NO: 812, SEQ ID NO: 813, SEQ ID NO: 814, SEQ ID NO: 
815, SEQ ID NO: 81 7, SEQ ID NO : 818, SEQ ID NO: 81 9, SEQ ID 

800 NO: 820, SEQ ID NO: 821, SEQ ID NO: 822, SEQ ID NO: 823, 
SEQ ID NO: 824, SEQ ID NO: 825, SEQ ID NO: 826, SEQ ID NO: 
827, SEQ ID NO: 828, SEQ ID NO: 829, SEQ ID NO: 830, SEQ ID 
NO: 831, SEQ ID NO: 832, SEQ ID NO: 833, SEQ ID NO: 834, 
SEQ ID NO: 835, SEQ ID NO: 836, SEQ ID NO: 837, SEQ ID NO: 

805 838, SEQ ID NO: 839, SEQ ID NO: 840, SEQ ID NO: 841, SEQ ID 
NO: 842, SEQ ID NO : 843, SEQ ID NO : 844, SEQ ID NO: 845, 
SEQ ID NO: 846, SEQ ID NO : 84 7, SEQ ID NO : 848, SEQ ID NO: 
849, SEQ ID NO: 850, SEQ ID NO: 851, SEQ ID NO: 852, SEQ ID 
NO: 853, SEQ ID NO : 854, SEQ ID NO: 855, SEQ ID NO : 856, 

810 SEQ ID NO: 857, SEQ ID NO: 858, SEQ ID NO: 859, SEQ ID NO: 
860, SEQ ID NO: 861, SEQ ID NO: 862, SEQ ID NO: 863, SEQ ID 
NO: 864, SEQ ID NO: 865, SEQ ID NO : 866, SEQ ID NO : 867, 
SEQ ID NO: 868, SEQ. ID NO: 869, SEQ ID NO: 870, SEQ ID NO: 
871, SEQ ID NO: 872, SEQ ID NO : 873, SEQ ID NO: 874, SEQ 

815 ID NO: 875, SEQ ID NO: 877, SEQ. ID NO: 878, SEQ ID NO: 879, 
SEQ ID NO: 880, SEQ ID NO: 881, SEQ ID NO: 882, SEQ ID NO: 
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883, SEQ ID NO: 884, SEQ ID NO: 885, SEQ ID NO: 886, SEQ ID 
NO: 887, SEQ ID NO: 888, SEQ ID NO: 889, SEQ ID NO : 890, 
SEQ ID NO: 891, SEQ ID NO: 892, SEQ ID NO: 893, SEQ ID NO: 

820 894, SEQ ID NO: 895, SEQ. ID NO: 896, SEQ ID NO: 897, SEQ ID 
NO: 898, SEQ ID NO: 899, SEQ ID NO: 900, SEQ ID NO: 901, 
SEQ ID NO: 902, SEQ ID NO : 903, SEQ ID NO : 904, SEQ ID NO: 
905, SEQ ID NO: 906, SEQ ID NO: 907, SEQ ID NO: 908, SEQ ID 
NO: 909, SEQ ID NO: 910, SEQ ID NO: 911, SEQ ID NO: 912, 

82 5 S E Q I D NO: 913, S E Q I D N 0 : 9 1 4 , S E Q I D N 0 : 9 1 5 , S E Q I D N 0 : 
916, SEQ ID NO: 917, SEQ ID NO : 918, SEQ ID NO: 919, SEQ ID 
NO: 920, SEQ ID NO: 921, SEQ. ID NO: 922, SEQ ID NO: 923, 
SEQ ID NO: 924, SEQ ID NO: 925, SEQ ID NO: 926, SEQ ID NO: 
928, SEQ ID NO: 92 9, SEQ ID NO: 930, SEQ ID NO: 931, SEQ ID 

830 NO: 932, SEQ ID NO: 933, SEQ ID NO: 934, SEQ ID NO : 935, 
SEQ ID NO: 936, SEQ ID NO: 937, SEQ ID NO : 938, SEQ ID NO: 
939, SEQ ID NO: 940, SEQ ID NO: 941, SEQ ID NO: 942, SEQ ID 
NO: 943, SEQ ID NO: 944, SEQ ID NO: 945, SEQ ID NO : 946, 
SEQ ID NO: 947, SEQ ID NO: 948, SEQ ID NO: 949, SEQ ID NO: 

835 950, SEQ ID NO : 951, SEQ. ID NO: 952, SEQ ID NO: 953, SEQ ID 
NO: 954, SEQ ID NO: 955, SEQ ID NO: 956, SEQ ID NO: 957, 
SEQ ID NO: 958, SEQ ID NO : 959, SEQ ID NO: 960, SEQ ID NO: 
961, SEQ ID NO: 962, SEQ ID NO: 963, SEQ ID NO: 964, SEQ ID 
NO: 965, SEQ ID NO : 966, SEQ ID NO: 967, SEQ ID NO : 968, 

840 SEQ ID NO: 969, SEQ ID NO: 970, SEQ ID NO: 971, SEQ ID NO: 
972, SEQ ID NO: 973, SEQ ID NO: 974, SEQ ID NO : 975, SEQ ID 
NO: 976, SEQ ID NO: 977, SEQ ID NO: 979, SEQ ID NO : 980, 
SEQ ID NO: 981, SEQ ID NO: 982, SEQ ID NO: 983, SEQ ID NO: 
984, SEQ ID NO: 985, SEQ ID NO: 986, SEQ ID NO: 987, SEQ ID 

845 NO: 988, SEQ ID NO: 989, SEQ ID NO: 990, SEQ ID NO: 991, 
SEQ ID NO: 992, SEQ ID NO: 993, SEQ. ID NO : 994, SEQ ID NO: 
995, SEQ ID NO: 996, SEQ ID NO: 997, SEQ ID NO: 998, SEQ ID 
NO: 999, SEQ ID NO: 1000, SEQ ID NO: 1001, SEQ ID NO: 1002, 
SEQ ID NO: 1003, SEQ. ID NO: 1004, SEQID NO: 1005, SEQ. ID 

850 NO: 1006, SEQ. ID NO: 1007, SEQ ID NO: 1008, SEQ ID NO: 1009, 
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SEQ ID NO: 1010, SEQ ID NO: 1011, SEQ ID NO: 1012, SEQ ID 
NO: 1014, SEQ ID NO: 1015, SEQ ID NO: 1016, SEQ ID NO: 1017, 
SEQ ID NO: 1018, SEQ ID NO: 1019, SEQ ID NO: 1020, SEQ ID 
NO: 1021, SEQ. ID NO: 1022, SEQID NO: 1023, SEQ ID NO: 1024, 

855 SEQ ID NO: 1025, SEQ ID NO: 1026, SEQ IDNO: 1027, SEQ ID 
NO: 1028, SEQ ID NO: 1030, SEQ ID NO: 1031, SEQ ID NO: 1032, 
SEQ ID NO: 1033, SEQ ID NO: 1034, SEQ ID NO: 1035, SEQ ID 
NO: 1036, SEQ ID NO: 1037, SEQ ID NO: 1038, SEQ ID NO: 1039, 
SEQ ID NO: 1040, SEQ ID NO: 1041, SEQ ID NO: 1042, SEQ ID 

860 N 0 : 1 0 4 3 , S E Q ID NO: 1 0 4 4 , S E Q 1 1) N 0 : 1 0 4 5 , SEQ I D N 0 : 1 0 4 6 , 
SEQ ID NO: 1047, SEQ ID NO: 1048, SEQ ID NO: 1049, SEQ ID 
NO: 1050, SEQ ID NO: 1051, SEQ ID NO: 1052, SEQ ID NO: 1053, 
SEQ ID NO: 1054, SEQ ID NO: 1056, SEQ ID NO: 1057, SEQ ID 
NO: 1058, SEQ ID NO: 1059, SEQ ID NO: 1061, SEQ ID NO: 1062, 

865 SEQ ID NO: 1063, SEQID NO: 1064, SEQ ID NO: 1065, SEQ ID 
NO: 1066, SEQ ID NO: 1067, SEQ ID NO: 1068, SEQ. ID NO: 1069, 
SEQ ID NO: 1070, SEQ ID NO : 1071, SEQ ID NO: 1072, SEQ ID 
NO : 1073, SEQ ID NO : 1074, SEQ [D NO: 1075, SEQ ID NO: 
1076, SEQ ID NO: 1077, SEQ ID NO: 1078, SEQ ID NO : 1079, 

870 SEQ ID NO: 1080, SEQID NO: 1081, SEQ. ID NO: 1082, SEQ. ID 
NO: 1083, SEQ. ID NO: 1084, SEQ I D NO: 1085, SEQ ID NO: 1086, 
SEQ ID NO: 1087, SEQ ID NO: 1088, SEQ ID NO: 1089, SEQ ID 
NO: 1090, SEQ ID NO : 1091, SEQ ID NO: 1092, SEQ ID NO: 1094, 
SEQ ID NO: 1095, SEQ ID NO : 1096, SEQ ID NO: 1097, SEQ ID 

875 NO : 1098, SEQ ID NO : 1099, SEQ ID NO : 1100, SEQ ID NO : 1101, 
SEQ ID NO: 1102, SEQ IDNO: 1103, SEQ ID NO: 1104, SEQ. ID 
NO: 1105, SEQ ID NO: 1106, SEQ ID NO: 1107, SEQ ID NO: 1108, 
S E Q ID N 0 : 1109, S E Q I D N 0: 1110, S E Q I D NO: 1111, S E Q I D 
NO: 1112, SEQ ID NO : 1113, SEQ ID NO : .1.114, SEQ ID NO: 

880 1115, SEQ ID NO: 1116, SEQ ID NO: 1117, SEQ ID NO: 1118, SEQ. 
ID NO: 1119, SEQID NO: 1120, SEQ ID NO: 1121, SEQ ID NO: 
1122, SEQ. ID NO : 1123, SEQ ID NO: 1124, SEQ ID NO: 1.1.25, 
SEQ ID NO: 1126, SEQ. ID NO: 1127, SEQ. ID NO: 1129, SEQ ID 
NO: 1130, SEQ. ID NO : 1131, SEQ ID NO : 1132, SEQ ID NO: 
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885 1133, SEQ ID NO : 1134, SEQ ID NO: 1135, SEQ ID NO: 1136, 
S E Q i: D N 0:1137, S E Q I D NO: 1138, S E Q I D N 0:1139, 8 E Q I D 
NO: 1140, SEQ ID NO: 1141, SEQ IDNO: 1142, SEQ ID NO: 1143, 
SEQ ID NO: 1144, SEQ ID NO: 1145, SEQ. ID NO: 1146, SEQ ID 
NO: 1147, SEQ ID NO : 1148, SEQ ID NO: 1149, SEQ ID NO : 11 50, 

890 SEQ ID NO: 1151, SEQ ID NO : 1152, SEQ ID NO: 1153, SEQ ID 
NO: 1154, SEQ ID NO: 1155, SEQ ID NO: 1156, SEQ. ID NO: 1158, 
SEQ ID NO: 1159, SEQ IDNO: 1160, SEQ ID NO: 1161, SEQ ID 
NO: 1162, SEQ ID NO : 11 63, SEQ ID NO: 1164, SEQ ID NO: 1165, 
SEQ ID NO: 1166, SEQ ID NO: 1167, SEQ ID NO : 11 68, SEQ ID 

895 NO: 1169, SEQ ID NO : 1170, SEQ ID NO : 1171, SEQ ID NO: 
1172, SEQ ID NO: 1173, SEQ ID NO: 1174, SEQ ID NO: 1175, 
SEQ ID NO: 1176, SEQID NO: 1177, SEQ ID NO: 1178, SEQ ID 
NO: 1179, SEQ ID NO: 1180, SEQ ID NO: 1181, SEQ ID NO: 1182, 
SEQ ID NO: 1183, SEQ ID NO: 1184, SEQ ID NO: 1185, SEQ ID 

900 NO: 1186, SEQ ID NO : 1187, SEQ ID NO: 1188, SEQ ID NO: 
1189, SEQ ID NO : 1190, SEQ ID NO : 1192, SEQ ID NO : 1193, 
SEQ ID NO : 1194, SEQID NO : 1195, SEQ ID NO : 1196, SEQ ID 
NO: 1197, SEQ ID NO: 1198, SEQ IDNO: 1199, SEQ. ID NO: 1200, 
SEQ ID NO: 1201, SEQ ID NO 1202. SEQ ID NO 1203. SEQ ID 

905 NO : 1204, SEQ. ID NO : 1205, SEQ ID NO : 1206, SEQ. ID NO : 1207, 
SEQ ID NO: 1208, SEQ ID NO: 1209, SEQ ID NO: 1210, SEQ ID 
NO: 1211, SEQ ID NO: 1213, SEQ. ID NO: 1214, SEQ. ID NO: 1215, 
SEQ ID NO : 1216, SEQ IDNO: 1217, SEQ ID NO: 1218, SEQ ID 
NO: 1219, SEQ ID NO: 1220, SEQ ID NO: 1221, SEQ ID NO: 1222, 

910 SEQ ID NO: 1223, SEQ ID NO: 1224, SEQ. ID NO: 1225, SEQ ID 
NO: 1226, SEQ ID NO: 1227, SEQ ID NO : 1228, SEQ. ID NO: 
1229, SEQ ID NO: 1230, SEQ ID NO: 1231, SEQ ID NO: 1232, 
SEQ ID NO: 1233, SEQID NO: 1234, SEQ ID NO: 1235, SEQ ID 
NO: 1236, SEQ ID NO: 1237, SEQ ID NO: 1238, SEQ. ID NO: 1239, 

915 SEQ ID NO: 1241, SEQ ID NO: 1242, SEQ ID NO: 1243, SEQ ID 
NO: 1244, SEQ ID NO: 1245, SEQ ID NO : 1246, SEQ ID NO: 
1247, SEQ ID NO: 1248, SEQ ID NO : 1249, SEQ ID NO : 1250, 
SEQ ID NO: 1251, SEQID NO: 1252, SEQ. ID NO: 1253, SEQ. ID 
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NO: 1254, SEQ ID NO : 1255, SEQ IDNO: 1256, SEQ ID NO: 1257, 

920 SEQ ID NO: 1259, SEQ ID NO: 1260, SEQ ID NO: 1261, SEQ ID 
NO: 1262, SEQ ID NO: 1263, SEQ ID NO: 1264, SEQ ID NO: 1265, 
SEQ ID NO: 1266, SEQ ID NO : 1267, SEQ. ID NO: 1268, SEQ ID 
NO: 1269, SEQ ID NO: 1270, SEQ ID NO: 1271, SEQ ID NO: 1272, 
SEQ ID NO: 1273, SEQ IDNO: 1275, SEQ ID NO: 1276, SEQ ID 

925 NO: 1277, SEQ ID NO : 1278, SEQ ID NO: 1279, SEQ. ID NO: 1280, 
SEQ ID NO: 1281, SEQ ID NO: 1282, SEQ ID NO: 1283, SEQ ID 
NO: 1284, SEQ ID NO: 1285, SEQ ID NO: 1286, SEQ ID NO: 
1287, SEQ ID NO: 1289, SEQ ID NO: 1290, SEQ ID NO: 1291, 
SEQ ID NO: 1292, SEQID NO: 1293, SEQ ID NO: 1294, SEQ ID 

930 NO:1295, SEQ ID NO:1296, SEQ ID NO:1297, SEQ ID NO:1298, 
SEQ ID NO: 1299, SEQ ID NO: 1300, SEQ ID NO: 1301, SEQ ID 
NO: 1303, SEQ ID NO : 1304, SEQ ID NO: 1305, SEQ ID NO: 
1306, SEQ ID NO : 1307, SEQ. ID NO : 1308, SEQ. ID NO: 1310, 
SEQ ID NO: 1311, SEQID NO : 1312, SEQ ID NO: 1313, SEQ ID 

935 NO: 1314, SEQ ID NO: 131 5, SEQ IDNO : 1316, SEQ ID NO: 1317, 
SEQ ID NO : 1318, SEQ ID NO : 1319, SEQ ID NO: 1320, SEQ ID 
NO: 1322, SEQ ID NO : 132 3, SEQ. ID NO: 1324, SEQ ID NO : 132 5, 
SEQ ID NO:1326, SEQ ID NO: 1327. SEQ ID NO 1:528. SEQ ID 
NO: 1330, SEQ ID NO : 1331, SEQ ID NO: 1332, SEQ ID NO: 1333, 

940 SEQ ID NO: 1334, SEQ IDNO: 1335, SEQ ID NO: 1336, SEQ ID 
NO: 1337, SEQ ID NO : 1339, SEQ ID NO: 1340, SEQ. ID NO: 1341, 
SEQ ID NO: 1342, SEQ ID NO: 1343, SEQ ID NO: 1344, SEQ ID 
NO: 1345, SEQ ID NO: 1346, SEQ ID NO: 1347, SEQ ID NO: 
1349, SEQ ID NO: 1350, SEQ ID NO: 1351, SEQ ID NO: 1352, 

945 SEQ ID NO: 1353, SEQID NO: 1354, SEQ ID NO: 1355, SEQ. ID 
NO: 1356, SEQ ID NO: 1357, SEQ ID NO: 1358, SEQ ID NO: 1360, 
SEQ ID NO: 1361, SEQ ID NO: 1362, SEQ ID NO: 1363, SEQ ID 
NO: 1364, SEQ ID NO : 1365, SEQ ID NO: 1367, SEQ ID NO: 
1368, SEQ. ID NO: 1369, SEQ ID NO: 1370, SEQ. ID NO: 1371, 

95 0 S E Q I D N 0 : 1 3 75, S E Q I D NO: 1376 , S E Q I D N 0 : 1 3 7 7 , S E Q I D 
NO: 1378, SEQ. ID NO: 1379, SEQ. IDNO: 1381, SEQ ID NO: 1382, 
SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ. ID NO: 1385, SEQ ID 
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NO : 1387, SEQ ID NO: 1388, SEQ ID NO: 1389, SEQ ID NO: 1390, 
SEQ ID NO: 1391, SEQ ID NO: 1392, SEQ ID NO: 1393, SEQ ID 

955 NO : 1395, SEQ ID NO : 1396, SEQ ID NO : 1397, SEQ ID NO : 1398, 
SEQ ID NO: 1399, SEQ IDNO:1400, SEQ ID NO: 1402, SEQ ID 
NO: 1403, SEQ ID NO: 1404, SEQ ID NO: 1405, SEQ ID NO: 1406, 
SEQ ID NO: 1407, SEQ ID NO: 1409, SEQ ID NO: 1410, SEQ ID 
NO: 1412, SEQ ID NO: 1413, SEQ ID NO: 1414, SEQ ID NO: 

960 1415, SEQ ID NO: 1416, SEQ. ID NO: 1417, SEQ ID NO : 1419, 
SEQ ID NO: 1420, SEQID NO: 1421, SEQ ID NO: 1422, SEQ ID 
NO: 1423, SEQ ID NO: 1424, SEQ ID NO: 1425, SEQ ID NO: 1427, 
SEQ ID NO: 1428, SEQ ID NO: 1429, SEQ ID NO: 1430, SEQ ID 
NO: 1431, SEQ ID NO : 1432, SEQ. ID NO : 1433, SEQ. ID NO: 

965 1434, SEQ ID NO: 1435, SEQ ID NO: 1437, SEQ ID NO: 1438, 
SEQ ID NO: 1439, SEQID NO: 1440, SEQ ID NO: 1441, SEQ ID 
NO: 1442, SEQ ID NO: 1444, SEQ IDNO: 1445, SEQ ID NO: 1446, 
SEQ ID NO: 1447, SEQ ID NO: 1448, SEQ ID NO: 1449, SEQ ID 
NO: 1451, SEQ ID NO: 1452, SEQ ID NO: 1453, SEQ, ID NO: 1454, 

970 SEQ ID NO: 1455, SEQ ID NO: 1456, SEQ ID NO: 1458, SEQ ID 
NO: 1459, SEQ ID NO: 1461, SEQ ID NO: 1462, SEQ ID NO : 1463, 
SEQ ID NO: 1464, SEQ IDNO: 1465, SEQ. ID NO: 1466, SEQ. ID 
NO: 1468, SEQ. ID NO : 1469, SEQ ID NO: 1470, SEQ. ID NO: 1472, 
SEQ ID NO: 1474, SEQ ID NO: 1475, SEQ ID NO: 1476, SEQ ID 

975 NO: 1477, SEQ ID NO : 1479, SEQ ID NO : 1480, SEQ ID NO: 
1481, SEQ ID NO: 1482, SEQ. ID NO : 1483, SEQ ID NO : 1484, 
SEQ ID NO: 1485, SEQID NO: 1486, SEQ ID NO: 1488, SEQ ID 
NO: 1490, SEQ. ID NO: 1491, SEQ ID NO: 1492, SEQ ID NO: 1493, 
SEQ ID NO: 1495, SEQ ID NO: 1496, SEQ. ID NO: 1497, SEQ ID 

980 NO: 1498, SEQ ID NO : 1500, SEQ ID NO: 1502, SEQ ID NO: 
1503, SEQ. ID NO: 1504, SEQ. ID NO: 1505, SEQ. ID NO : 1507, 
SEQ ID NO: 1509, SEQID NO: 1512, SEQ ID NO: 1513, SEQ ID 
NO: 1514, SEQ ID NO: 1515, SEQ IDNO: 1517, SEQ ID NO: 1518, 
SEQ ID NO: 1519, SEQ ID NO: 1521, SEQ ID NO : 1 522, SEQ ID 

985 NO: 1523, SEQ. ID NO: 1524, SEQ. ID NO: 1525, SEQ ID NO: 1527, 
SEQ ID NO: 1528, SEQ ID NO: 1529, SEQ. ID NO: 1530, SEQ ID 
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NO: 1531, SEQ ID NO: 1533, SEQ ID NO: 1534, SEQ ID NO: 1535, 
S E Q I D N 0 : 1 5 36, S E Q I D N 0:1538, S E Q I D N 0 : 1 5 3 9 , S E Q I D 
NO: 1541, SEQ ID NO: 1542, SEQ ID NO: 1543, SEQ ID NO: 1544, 
990 SEQ ID NO: 1546, SEQ ID NO: 1548, SEQ. ID NO: 1550, SEQ ID 
NO: 1552, SEQ, ID NO: 1554, SEQ ID NO: 1556, SEQ ID NO: 
1557, SEQ ID NO: 1559, SEQ. ID NO: 1560, SEQ ID NO: 1561, 
SEQ ID NO: 1562, SEQID NO: 1564, SEQ ID NO: 1565, SEQ ID 
NO: 1567, SEQ ID NO: 1568, SEQ ID NO: 1570, SEQ ID NO: 1572, 
995 SEQ ID NO: 1573, SEQ ID NO: 1574, SEQ ID NO: 1575, SEQ ID 
NO: 1577, SEQ ID NO : 1578, SEQ ID NO: 1579, SEQ ID NO: 
1581, SEQ ID NO: 1582, SEQ ID NO: 1583, SEQ ID NO: 1585, 
SEQ ID NO : 1586, SEQID NO: 1588, SEQ ID NO: 1589, SEQ. ID 
NO: 1590, SEQ ID NO: 1592, SEQ IDNO: 1593, SEQ ID NO: 1595, 

1000 SEQ ID NO: 1597, SEQ ID NO: 1598, SEQ ID NO: 1600, SEQ ID 
NO: 1602, SEQ ID NO: 1606, SEQ ID NO: 1608, SEQ. ID NO: 1609, 
SEQ ID NO: 1610, SEQ ID NO: 1611, SEQ ID NO: 1613, SEQ ID 
NO: 1614, SEQ ID NO: 1616, SEQ ID NO: 1618, SEQ ID NO: 1620, 
SEQ ID NO: 1621, SEQ IDNO : 1623, SEQ ID NO: 1625, SEQ ID 

1005 NO: 1628, SEQ ID NO: 1630, SEQ ID NO: 1631, SEQ ID NO: 1633, 
SEQ ID NO : 1634, SEQ ID NO: 1638, SEQ. ID NO: 1641, SEQ ID 
NO: 1642, SEQ ID NO : 1644, SEQ, ID NO : 1645, SEQ. ID NO: 
1647, SEQ. ID NO : 1648, SEQ ID NO : 1650, SEQ. ID NO: 1651, 
SEQ ID NO: 1653, SEQID NO: 1654, SEQ ID NO: 1656, SEQ ID 

1010 NO: 1657, SEQ ID NO: 1659, SEQ ID NO: 1661, SEQ. ID NO: 1663, 
SEQ, ID NO: 1665, SEQ ID NO : 1 667, SEQ ID NO: 1671, SEQ ID 
NO: 1674, SEQ ID NO : 1676, SEQ. ID NO: 1678, SEQ. ID NO: 
1679, SEQ ID NO: 1681, SEQ ID NO: 1684, SEQ ID NO: 1686, 
SEQ ID NO: 1687, SEQID NO: 1689, SEQ ID NO: 1692, SEQ ID 

1015 NO: 1693, SEQ ID NO: 1695, SEQ IDNO : 1697, SEQ ID NO: 1698, 
SEQ ID NO: 1702, and SEQ ID NO: 1703 

, or (b) an amino acid sequence in the amino acid 
sequences set forth in (a) in which several amino acids are 
deleted, replaced or added. 

1020 6. A vector dontaining the nucleic-acid molecule of claim 1 
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as an inserted substance, 

7. The vector of claim 6, wherein the inserted substance is 
linked with an element of transcriptional regulation in their 
action. 

1025 8. A host cell which is transformed with the vector of claim 

9. A method of producing a polypeptide specific to 0~157:H7 
comprising cultivation of the host cell of claim 8. 

1030 10. An oligonucleotide or polynucleotide specific to 
enterohemorrhagic pathogenic-E. coli Ol57:H7 comprising a 
nucleotide sequence constituted of at least 8 nucleotides in 

(a) a nucleotide sequence selected from a group 
comprising the following SEQ IDs: SEQ ID NO: 1, SEQ ID NO: 

1035 132, SEQ ID NO: 244, SEQ ID NO: 337, SEQ ID NO: 410, SEQ ID 
NO: 484, SEQ ID NO : 554, SEQ ID NO: 630, SEQ ID NO: 889, 
SEQ ID NO: 755, SEQ ID NO: 816, SEQ ID NO: 876, SEQ ID NO: 
927, SEQ ID NO: 978, SEQ ID NO: 1013, SEQ ID NO: 1029, SEQ 
IDNO: 1055, SEQ ID NO: 1060, SEQ ID NO: 1093, SEQ ID NO: 

1040 1128, SEQ ID NO: 1157, SEQ ID NO: 1191, SEQ ID NO: 1212, 
SEQ ID NO: 1240, SEQ ID NO: 1258, SEQ ID NO: 1274, SEQ ID 
NO: 1288, SEQ ID NO: 1302, SEQ ID NO: 1309, SEQ ID NO: 1321, 
SEQ ID NO: 1329, SEQ ID NO: 1338, SEQ ID NO: 1348, SEQID 
NO: 1359, SEQ ID NO: 1366, SEQ ID NO: 1374, SEQ ID NO: 1380, 

1045 SEQ ID NO: 1386, SEQ ID NO: 1394, SEQ ID NO: 1401, SEQ ID 
NO: 1408, SEQ ID NO: 1411, SEQ ID NO: 1418, SEQ ID NO: 1428, 
SEQ ID NO: 1436, SEQ ID NO: 1443, SEQ ID NO: 1450, SEQ ID 
NO: 1457, SEQ ID NO: 1480, SEQ ID NO: 1487, SEQID NO: 1471, 
SEQ ID NO: 1473, SEQ ID NO: 1478, SEQ ID NO: 1487, SEQ 

1050 IDNO: 1489, SEQ ID NO: 1494, SEQ ID NO: 1499, SEQ, ID NO: 
1501, SEQ ID NO: 1506, SEQ ID NO: 1508, SEQ. ID NO: 1510, 
SEQ ID NO: 1511, SEQ ID NO: 1516, SEQ ID NO: 1520, SEQ ID 
NO: 1526, SEQ. ID NO: 1532, SEQ. ID NO: 1537, SEQ ID NO: 1540, 
SEQ ID NO: 1545, SEQ ID NO: 1547, SEQ ID NO: 1549, SEQID 
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1 055 NO: 1551, SEQ ID NO : 1553, SEQ ID NO : 1555, SEQ ID NO: 1558, 
SEQ ID NO: 1563, SEQ ID NO: 1566, SEQ ID NO: 1569, SEQ ID 
NO: 1571, SEQ ID NO : 1576, SEQ ID NO: 1580, SEQ ID NO: 1584, 
SEQ ID NO: 1587, SEQ ID NO: 1591, SEQ ID NO: 1594, SEQ ID 
NO: 1596, SEQ ID NO : 1599, SEQ ID NO: 1601, SEQII) NO: 1603, 

1060 SEQ ID NO: 1604, SEQ ID NO: 1605, SEQ ID NO: 1607, SEQ 
IDNO : 1612, SEQ ID NO : 1615, SEQ ID NO: 1617, SEQ ID NO: 
1619, SEQ ID NO: 1622, SEQ ID NO: 1624, SEQ ID NO: 1626, 
SEQ ID NO: 1627, SEQ ID NO: 1629, SEQ ID NO: 1632, SEQ ID 
NO: 1635, SEQ ID NO: 1636, SEQ ID NO: 1637, SEQ ID NO: 1639, 

1065 SEQ ID NO: 1640, SEQ ID NO: 1643, SEQ ID NO: 1646, SEQ 
IDNO:1649, SEQ ID NO:1652, SEQ ID NO:1655, SEQ ID NO: 
1658, SEQ ID NO: 1660, SEQ ID NO : 1662, SEQ ID NO: 1664, 
SEQ ID NO: 1666, SEQ ID NO: 1668, SEQ ID NO: 1669, SEQ ID 
NO: 1670, SEQ ID NO : 1672, SEQ ID NO: 1673, SEQ ID NO: 1675, 

1070 SEQ ID NO: 1677, SEQ ID NO: 1680, SEQ ID NO: 1682, SEQID 
NO: 1683, SEQ ID NO : 1685, SEQ ID NO: 1688, SEQ ID NO: 1690, 
SEQ ID NO : 1691, SEQ ID NO: 1694, SEQ ID NO: 1696, SEQ ID 
NO: 1699, SEQ ID NO: 1700, SEQ ID NO: 1701, SEQ ID NO: 1704, 
SEQ ID NO: 1705, SEQ ID NO: 1706, SEQ ID NO: 1707, SEQ ID 

1075 NO : 1708, SEQ ID NO : 1709, SEQ ID NO : 1710, SEQID NO : 1711, 
SEQ ID NO : 1712, SEQ ID NO : 1713, SEQ ID NO: 1715, SEQ 
IDNO: 1716, SEQ ID NO: 1717, SEQ. ID NO: 1718,, SEQ ID NO: 
1719, SEQ ID NO: 1720, SEQ ID NO: 1721, SEQ ID NO: 1722, 
SEQ ID NO: 1723, SEQ ID NO: 1724, SEQ ID NO: 1725, SEQ ID 

1080 NO: 1726, SEQ ID NO: 1727, SEQ ID NO: 1728, SEQ ID NO: 1729, 
SEQ ID NO: 1730, SEQ ID NO: 1731, SEQ ID NO: 1732, SEQID 
NO: 1733, SEQ ID NO: 1734, SEQ ID NO: 1735, SEQ ID NO: 1736, 
SEQ ID NO: 1737, SEQ ID NO: 1738, SEQ ID NO: 1739, SEQ ID 
NO: 1740, SEQ ID NO: 1741, SEQ ID NO: 1742, SEQ. ID NO: 1743, 

1085 SEQ ID NO: 1744, SEQ ID NO: 1745, SEQ ID NO: 1746, SEQ ID 
NO: 1747, SEQ. ID NO: 1748, SEQ. ID NO: 1749, SEQID NO: 1750, 
SEQ ID NO: 1751, SEQ ID NO: 1752, SEQ ID NO: 1753, SEQ 
IDNO: 1754, SEQ. ID NO: 1755, SEQ ID NO: 1756, SEQ ID NO: 
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1757, SEQ ID NO: 1758, SEQ ID NO: 1759, SEQ ID NO: 1760, 

1090 SEQ ID NO: 1761, SEQ ID NO: 1762, SEQ ID N T 0:1763, SEQ ID 
NO: 1764, SEQ ID NO: 1765, SEQ ID NO: 1766, SEQ ID NO: 1767, 
SEQ ID NO: 1768, SEQ ID NO: 1769, SEQ ID NO: 1770, SEQ 
IDNO:1771, SEQ ID NO: 1772, SEQ ID NO: 1773, SEQ ID NO: 
1774, SEQ ID NO: 1775, SEQ ID NO: 1776, SEQ ID NO: 1777, 

1095 SEQ ID NO: 1778, SEQ ID NO: 1779, SEQ ID NO: 1780, SEQ ID 
NO: 1781, SEQ ID NO: 1782, SEQ ID NO: 1783, SEQ ID NO: 1784, 
SEQ ID NO: 1785, SEQ ID NO: 1786, SEQ ID NO: 1787, SEQ ID 
NO: 1788, SEQ ID NO: 1789, SEQ ID NO: 1790, SEQ ID NO: 1791, 
SEQ ID NO: 1792, SEQ ID NO: 1793, SEQ ID NO: 1794, SEQ ID 

1100 NO: 1795, SEQ ID NO: 1796, SEQ ID NO: 1797, SEQ ID NO: 1798, 
SEQ ID NO: 1799, SEQ ID NO: 1800, SEQ ID NO: 1801, SEQ ID 
NO: 1802, SEQ ID NO: 1803, SEQ ID NO: 1804, SEQID NO: 1805, 
SEQ ID NO: 1806, SEQ ID NO: 1807, SEQ ID NO: 1808, SEQ 
IDNO:1809, SEQ ID NO: 1810, SEQ, ID NO: 1811, SEQ. ID NO: 

1105 1812, SEQ ID NO: 1813, SEQ ID NO: 1814, SEQ ID NO: 1815, 
SEQ ID NO: 1816, SEQ ID NO: 1817, SEQ ID NO: 1818, SEQ ID 
NO: 1819, SEQ ID NO: 1820, SEQ ID NO: 1821, SEQ ID NO: 1822, 
SEQ ID NO: 1823, SEQ ID NO: 1824, SEQ ID NO: 1825, SEQ 
IDNO: 1826, SEQ ID NO: 1827, SEQ ID NO: 1828, SEQ ID NO: 

I 110 1829, SEQ ID NO: 1830, SEQ ID NO: 1831, SEQ. ID NO: 1832, 

SEQ ID NO: 1833, SEQ ID NO: 1834, SEQ ID NO: 1835, SEQ ID 
NO: 1836, SEQ ID NO: 1837, SEQ ID NO: 1838, SEQ ID NO: 1839, 
SEQ ID NO: 1840, SEQ ID NO: 1841, SEQ ID NO: 1842, SEQID 
NO: 1843, SEQ. ID NO: 1844, SEQ ID NO: 1845, SEQ ID NO: 1846, 
1115 SEQ ID NO: 1847, SEQ ID NO: 1848, SEQ ID NO: 1849, SEQ ID 
NO: 1850, SEQ, ID NO : 1851, SEQ ID NO: 1852, SEQ ID NO: 1853, 
SEQ ID NO: 1854, SEQ ID NO: 1855, SEQ ID NO: 1856, SEQ ID 
NO: 1857, SEQ ID NO: 1858, SEQ ID NO: 1859, SEQID NO: 1860, 
SEQ ID NO: 1861, SEQ. ID NO: 1862, SEQ. ID NO: 1863, SEQ 

I I 20 IDNO : 1864, SEQ ID NO : 1865, and SEQ ID NO : 1866 

, and/or (b) a complementary nucleotide sequence to 

the nucleic-acid sequence set forth in (a). 
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11. Use of the oligonucleotide or polynucleotide of claim 10 
as a probe for hybridization or a primer for PGR. 
1125 12. An use of the oligonucleotide or polynucleotide of claim 
11 for detection or diagnosis of 0-157 infection. 

13. A vaccine composition comprising the nucleic-acid 
molecule of claim 1 or its fragment, or the oligonucleotide or 
polynucleotide of claim 10 and a pharmaceutical!} 7 acceptable 

1130 carrier. 

14. A vaccine composition comprising the polypeptide of claim 
4 or its fragment, and a pharmaceutic ally acceptable carrier. 

15. An antibody molecule specifically recognizing the 
polypeptide of claim 4. 

1135 16. A DNA microarray or DNA chip including the nucleic-acid 
molecule of claim 1 and/or at least one of the oligonucleotide or 
polynucleotide of claim 10. 

17. Use of the DNA microarray or DNA chip for deteciton of 
0-157 infection or classification of 0-157. 
1140 .1.8. A method of screening a compound useful for prevention 
or therapy of 0-157 infection and a symptom caused thereby, 
using the nucleic-acid molecule of claim 1 or fragment thereof, 
or the polypeptide of claim 4 or fragment thereof. 
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1145 DESCRIPTION 

A nucleic-acid molecule and a polypeptide specific to 
enterohemo r r h a g i c 
E. coli 0"15?:H7 and a method of using thereof 

1150 [0001] 

INDUSTRIAL APPLICABLE FIELDS 

The present invention relates to a novel nucleic-acid 
molecule and a polypeptide specific to Ol57:H7 as well as use 
thereof. 

1155 [0002] 

BACKGROUND ART 

Although E. coli also inhabits large intestine of healthy 
human, most E. coli especially causes no disease. However a 
part of E. coli infects the intestine of human to cause food 

1160 poisoning such as enterogastritis and diarrhea. These are 
referred to as pathogenic E. coli and classified mainly into the 
f o How i ng 5 categories: E n t e r o t o x i g e n i c E s c h e r i c h i a c oil: E T E C , 
Enteroinvasive Escherichia coli: EIEC, Enteropathogenic 
Escherichia coli: EPEC, Enterohemorrhagic Escherichia coli: 

1165 EHEC, Entero adherent Escherichia coli ; EAEC 
[0003] 

EHEC therein includes E. coli which cause, as a main 
symptom, severe abdominal pain, diarrhea and/or hematochezia, 
in especially a child and an aged person, a serious complication 

1170 such as renal dysfunction and haemolytic uraemic syndrome 
(HUS) and, in some cases, lead a patient to death. A main 
pathogenic bacterium therein is O- 157:117 (hereinafter referred 
to as "0-157"). 0-157 belongs to a serotype different from that 
of EPEC or enteroinvasive E, coli which has been reported. In 

1175 addition, it has been reported as a pathogenic E. coli which 
produces no thermolabile enterotoxin (LT) and thermostable 
enterotoxin (ST) by Riley et al. (Riley LW, et al., N. Engl. J. 
Med. 308 (1983), p. 881-685). Furthermore, 0-157 and EHEC 
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are also referred to as Verotoxin-producing Escherichia coli 
1 1 80 (VTEC), since it has been revealed that extracellular toxin 
produced by them is Verotoxin (VT). 
[0004] 

The verotoxin (VT) produced by EHEC (or VTEC) is 
identified as toxin which has potent cytotoxicity on Vero cells, 

1185 African green monkey kidney cells. O'Brien et al, (J. Infect 
Dis. 146 (1982), p. 763-769) reported that its toxicity was 
neutralized by an antibody to Shiga toxin produced by 
dysentery bacillus, and referred the toxin to as Shiga-like toxin. 
The verotoxin includes two major types (VT1 and VT2). Since 

1190 the verotoxins are similar to Shiga toxin, they are also referred 
to as SLT1 (Shiga-Like Toxin l) and SLT2 respectively. VT1 is 
identical to Shiga toxin, or different in 1 amino acid merely. 
VT2 has homology of approximately 56% at amino acid level to 
VT1 (Jackson M.P. et al., FEMS Micorobial Lett. 44 (1987) p. 

1195 109-114), whereas their antigenicity are little common. The 
verotoxin and the Shiga toxin has the same N-glycosidic 
activity as that of lysin which is a potent phytotoxin derived 
from a plant. Their effects and functions are for inhibiting 
linkage of an aminoacyl tRNA to a ribosome to inhibit protein 

1 200 synthesis by hydrolyzing an N-glycosidic linkage at an 
adenosine in 28S ribosomal RNA constituting mammalian 
eukaryotic 60S ribosome, thereby resulting in cell death. 
Especially, the verotoxin cause damage to a vascular 
endothelial cell such as large intestine and a renal tubular cell 

1205 to cause haemolytic uraemic syndrome and the like. 
[0005] 

As mentioned above, 0-157 causes hemorrhagic colitis 
and sometimes complicates haemolytic uraemic syndrome or 
encephalopathy which expose patient's life to danger. Up to 
1210 now, none of effective methods for inhibiting or preventing 
progression to haemolytic uraemic syndrome have been 
established. In addition, administration of an antibacterial 
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agent such as antibiotic promote the extracellular release of VT, 
sometimes resulting in making the symptom, worse. Therefore, 
1215 definitive diagnosis of infection is important at early stage of 
the infection. 
[0006] 

Several methods are known as methods for diagnosis of 
the Q-157 infection, i.e. the methods for distinguishing 0-157 

1220 from nonpathogenic or other pathogenic E. coli. One of them 
applies a feature that 0-157:H7 is different from general E, coli 
and other known EPEC in the point that O - 1 5 7 : H 7 produces no 
/3 -glucuronidase and ferments no sorbitol of saccharide, or do 
after some delay. This method has been used widely. However, 

1225 these methods have the weak point of taking time and lacking 
rapidity. Further, although the presence of 0-157 capable of 
degrading sorbitol is reported, these methods can not detect 
such bacterium. On the other hand, reversed passive latex 
aggregate reaction using an antibody to lipopoly saccharide 

123 0 antigen of 0-157 or an antibody to the verotoxin is known. 

These methods can detect the bacterium producing VT rapidly 
and conveniently, but their detection sensitivity is not 
sufficient. Especially as to verotoxin, bacteria producing the 
toxin are not restricted to 0-157, thus these methods have a 

1235 task [should be solved] as methods for detecting 0-157. 
[00071 

Further, molecular biological methods, specifically, 
hybridization assay and PGR assay, are performed as the 
methods for detecting 0-157. Especially, PGR is of extremely 

1240 high detection sensitivity, high rapidity and high convenience, 
resulting in increasing use of it in recent years. Main tergert 
of PCT etc. is VT gene of VTEC such as 0-157. However, as 
mentioned above, E. coli other than 0-157 also has the VT gene, 
and furthermore, multiple mutants of VT gene are known, thus 

1245 there is a task [should be solved] as definitive methods for 
diagnosis of 0-157. Moreover, although pulsed-field gel 
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electrophoresis (PFGE) is use for detection of 0-157. an 
apparatus required for performing this method is expensive, 
and the method requires long time for detection and 

125 0 considerabl3 r skilled technique. In addition, the number of 
strains which can be analysed at once is limited and comparison 
of data of 0-157 at different institutions is not easy. Therefore, 
there is need for a method which is of rapid, convenient, high 
detection sensitivity, high confidence and ease of comparison 

1255 and exchange of data between different institutions. 
[0008] 

On the other hand, although antibacterial agents 
considered to be effective to 0-157, such as antibiotic, are 
known, the presence of drug-resistant bacteria has also been 

1260 reported. In addition, as mentioned above, VT is released to 
extracellular space by administration of antibiotics, sometimes 
resulting in making the patient's symptom worse. Therefore, 
there is a requirement for development of a method different 
from the method for therapy of infectious disease caused by 

1265 0-157 using these antibacterial agents, a method for therapy 
and/or prevention of the symptom caused by VT, and detailed 
genetic information of 0-157 which may serve as a guidance 
thereto. 
[0009] 

12 7 0 PROBLEMS TO B E SOLVED BY THE INVE NTION 

Accordingly, the task of the present invention is providing a 
nucleic-acid molecule, a polypeptide, genetic information 
thereof and a method of using them which may be useful for 
detection and therapy of enterohemorrhagic pathogenic- El. coli 

1275 0-157:H7 infection. 
[0010] 

Means To Solve The Problem 

We have found genetic information specific to 0- 157:H7 
which is not present in other E. coli including nonpathogenic E. 
1280 coli by analyzing whole genetic information of 



Appendix B: Hideo et at. Full Translation 

enterohemorrhagic pathogenic-E. coli 0-157 : H7 Sakai ( RIMD 
0509952 ) . Therefore, the present invention relates to the 
genetic information specific to 0-157^H7 and the use thereof. 
The genetic information includes, but not restricted to, a 
1285 nucleotide sequence on genome, a gene, a polypeptide encoded 
thereby, an amino acid sequence thereof and the like. 
[0011] 

Therefore, the present invention relates to a nucleic-acid 
molecule specific to enterohemorrhagic pathogenic- E. coli 

1290 0-157:H7. In a preferred embodiment, the present invention 
relates to a nucleic-acid molecule having 

(a) a nucleotide sequence selected from a group 
comprising the following SEQ IDs: SEQ ID NO: 1, SEQ ID NO: 
132, SEQ ID NO: 244, SEQ ID NO : 337, SEQ ID NO: 410, SEQ ID 

1295 NO: 484, SEQ ID NO : 554, SEQ ID NO: 630, SEQ ID NO : 689, 
SEQ ID NO: 755, SEQ ID NO: 816, SEQ ID NO: 876, SEQ ID NO: 
927, SEQID NO: 978, SEQ ID NO : 1013, SEQ ID NO: 1029, SEQ 
ID NO: 1055, SEQ ID NO: 1060, SEQ ID NO: 1093, SEQ ID NO: 
1128, SEQ ID NO: 1157, SEQ ID NO: 1191, SEQ ID NO: 1212, 

1300 SEQ ID NO: 1240, SEQ ID NO: 1258, SEQ ID NO: 1274, SEQ ID 
NO: 1288, SEQ. ID NO : 1302, SEQ ID NO: 1309, SEQ ID NO : 1321, 
SEQID NO: 1329, SEQ ID NO: 1338, SEQ ID NO: 1348, SEQ ID 
NO: 1359, SEQ ID NO : 1366, SEQ ID NO: 1374, SEQ. ID NO: 1380, 
SEQ ID NO: 1386, SEQ ID NO: 1394, SEQ ID NO: 1401, SEQ ID 

1305 NO : 1408, SEQ ID NO : 1411, SEQ ID NO : 1418, SEQ ID NO : 1426, 
SEQ ID NO: 1436, SEQ ID NO: 1443, SEQ ID NO: 1450, SEQID 
NO: 1457, SEQ. ID NO: 1460, SEQ ID NO: 1467, SEQ ID NO: 1471, 
SEQ IDNO: 1473, SEQ ID NO: 1478, SEQ ID NO: 1487, SEQ ID 
NO: 1489, SEQ ID NO: 1494, SEQ ID NO: 1499, SEQ, ID NO: 

1310 1501, SEQ ID NO: 1506, SEQ ID NO: 1508, SEQ. ID NO: 1510, 
SEQ ID NO: 1511, SEQ. ID NO: 1516, SEQ ID NO: 1520, SEQ ID 
NO: 1526, SEQ. ID NO: 1532, SEQ ID NO: 1537, SEQ ID NO: 1540, 
SEQ ID NO: 1545, SEQ. ID NO: 1547, SEQ. ID NO: 1549, SEQ ID 
NO: 1551, SEQ. ID NO: 1553, SEQ ID NO: 1555, SEQ ID NO: 1558, 
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1315 SEQ ID NO: 1563, SEQ ID NO : 1566, SEQ ID NO: 1569, SEQ ID 
NO: 1571, SEQ ID NO : 1576, SEQ ID NO: 1580, SEQ ID NO: 1584, 
SEQ ID NO: 1587, SEQ ID NO: 1591, SEQ ID NO: 1594, SEQID 
NO: 1596, SEQ. ID NO: 1599, SEQ ID NO: 1601, SEQ ID NO: 1603, 
SEQ ID NO: 1604, SEQ ID NO: 1605, SEQ ID NO: 1607, SEQ ID 

1 320 NO : 1612, SEQ ID NO: 1615, SEQ ID NO: 1617, SEQ ID NO: 1619, 
SEQ ID NO: 1622, SEQ ID NO: 1624, SEQ ID NO: 1626, SEQ ID 
NO: 1627, SEQ ID NO: 1629, SEQ ID NO: 1632, SEQID NO: 1635, 
SEQ ID NO: 1636, SEQ ID NO : 1637, SEQ ID NO: 1639, SEQ 
IDNO: 1640, SEQ ID NO : 1 643, SEQ ID NO: 1646, SEQ ID NO: 

1325 1649, SEQ ID NO: 1652, SEQ. ID NO: 1655, SEQ ID NO: 1658, 
SEQ ID NO: 1660, SEQ ID NO: 1662, SEQ ID NO: 1664, SEQ ID 
NO: 1666, SEQ ID NO: 1668, SEQ ID NO: 1669, SEQ ID NO: 1670, 
SEQ ID NO: 1672, SEQ ID NO: 1673, SEQ ID NO: 1675, SEQ 
IDNO : 1677, SEQ ID NO: 1680, SEQ ID NO: 1682, SEQ. ID NO: 

1330 1683, SEQ ID NO: 1685, SEQ ID NO: 1688, SEQ. ID NO: 1690, 
SEQ ID NO: 1691, SEQ ID NO: 1694, SEQ ID NO: 1696, SEQ ID 
NO: 1699, SEQ ID NO: 1700, SEQ ID NO: 1701, SEQ ID NO: 1704, 
SEQ ID NO: 1705, SEQ. ID NO: 1706, SEQ. ID NO : 1707, SEQID 
NO: 1708, SEQ ID NO: 1709, SEQ. ID NO: 1710, SEQ. ID NO: 1711, 

1 33 5 SEQ ID NO: 1712, SEQ ID NO: 1713, SEQ. ID NO: 1715, SEQ ID 
NO: 1716, SEQ ID NO: 1717, SEQ ID NO: 1718,, SEQ ID NO: 
1719, SEQ ID NO: 1720, SEQ ID NO: 1721, SEQ. ID NO: 1722, 
SEQ ID NO: 1723, SEQ ID NO: 1724, SEQ ID NO: 1725, SEQ ID 
NO: 1726, SEQ ID NO: 1727, SEQ ID NO: 1728, SEQ ID NO: 1729, 

1340 SEQ IDNO: 1730, SEQ ID NO: 1731, SEQ. ID NO: 1732, SEQ. ID 
NO: 1733, SEQ. ID NO: 1734, SEQ ID NO: 1735, SEQ ID NO: 1736, 
SEQ ID NO: 1737, SEQ ID NO: 1738, SEQ ID NO: 1739, SEQ ID 
NO: 1740, SEQ ID NO : 1741, SEQ ID NO: 1742, SEQ ID NO: 1743, 
SEQ ID NO: 1744, SEQ ID NO: 1745, SEQ ID NO: 1746, SEQID 

1345 NO: 1747, SEQ ID NO: 1748, SEQ ID NO: 1749, SEQ. ID NO: 1750, 
SEQ ID NO: 1751, SEQ ID NO : 1752, SEQ ID NO: 1753, SEQ ID 
NO: 1754, SEQ. ID NO: 1755, SEQ. ID NO: 1756, SEQ ID NO: 1757, 
SEQ ID NO: 1758, SEQ ID NO: 1759, SEQ ID NO: 1760, SEQ ID 
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NO: 1761, SEQ ID NO : 1762, SEQ ID NO: 1763, SEQID NO: 1764, 

1350 SEQ ID NO: 1765, SEQ ID NO: 1766, SEQ ID NO: 1767, SEQ 
IDNO: 1768, SEQ. ID NO: 1769, SEQ ID NO: 1770, SEQ ID NO: 
1771, SEQ ID NO: 1772, SEQ. ID NO: 1773, SEQ ID NO: 1774, 
SEQ ID NO: 1775, SEQ ID NO : 1776, SEQ ID NO: 1777, SEQ ID 
NO: 1778, SEQ ID NO: 1779, SEQ ID NO: 1780, SEQ. ID NO: 1781, 

1 355 SEQ ID NO: 1782, SEQ. ID NO: 1783, SEQ. ID NO: 1784, SEQ 
IDNO: 1785, SEQ ID NO: 1786, SEQ ID NO: 1787, SEQ. ID NO: 
1788, SEQ ID NO: 1789, SEQ ID NO: 1790, SEQ ID NO: 1791, 
SEQ ID NO: 1792, SEQ ID NO : 1 793, SEQ ID NO: 1794, SEQ ID 
NO: 1795, SEQ. ID NO: 1796, SEQ, ID NO: 1797, SEQ ID NO: 1798, 

1360 SEQ ID NO: 1799, SEQ ID NO: 1800, SEQ ID NO: 1801, SEQID 
NO: 1802, SEQ ID NO: 1803, SEQ ID NO: 1804, SEQ ID NO: 1805, 
SEQ ID NO: 1806, SEQ ID NO: 1807, SEQ ID NO: 1808, SEQ ID 
NO: 1809, SEQ ID NO: 1810, SEQ ID NO: 1811, SEQ ID NO: 1812, 
SEQ ID NO: 1813. SEQ ID NO: 1814. SEQ ID NO: 1815, SEQ ID 

1365 NO: 1816, SEQ ID NO: 1817, SEQ ID NO: 1818, SEQID NO: 1819, 
SEQ ID NO: 1820, SEQ ID NO: 1821, SEQ ID NO: 1822, SEQ 
IDNO: 1823, SEQ ID NO: 1824, SEQ ID NO: 1825, SEQ ID NO: 
1826, SEQ ID NO: 1827, SEQ ID NO: 1828, SEQ ID NO: 1829, 
SEQ ID NO: 1830, SEQ ID NO: 1831, SEQ. ID NO : 1832, SEQ ID 

1 370 NO: 1833, SEQ ID NO: 1834, SEQ ID NO: 1835, SEQ. ID NO: 1836, 
SEQ ID NO: 1837, SEQ ID NO: 1838, SEQ. ID NO: 1839, SEQ. 
IDNO: 1840, SEQ ID NO: 1841, SEQ ID NO: 1842, SEQ ID NO: 
1843, SEQ ID NO: 1844, SEQ ID NO : 1845, SEQ ID NO: 1846, 
SEQ ID NO: 1847, SEQ. ID NO: 1848, SEQ. ID NO: 1849, SEQ ID 

1375 NO: 1850, SEQ. ID NO: 1851, SEQ ID NO: 1852, SEQ ID NO: 1853, 
SEQ ID NO: 1854, SEQ ID NO: 1855, SEQ ID NO: 1856, SEQID 
NO: 1857, SEQ ID NO: 1858, SEQ ID NO : 1859, SEQ. ID NO: 1860, 
SEQ ID NO: 1861, SEQ ID NO: 1862, SEQ ID NO: 1863, SEQ ID 
NO: 1864, SEQ ID NO: 1865, and SEQ ID NO: 1866 

1380 (b) a partial sequence in the nucleotide sequences set 

forth in (a); 

(c) a complementary nucleotide sequence to the 
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nucleotide sequence set forth in (a) or (b) > or 

(d) a nucleotide sequence hybridizing to the nucleotide 
1385 sequences set forth in (a), (b) or (c) under a stringent condition. 

These nucleic-acid molecules of the present invention 
include a large number of 0-157 specific genes, [wherein] the 
genes encode proteins or polypeptides specific to 0-157. 
[0012] 

1390 Accordingly, the present invention relates to a 

nucleic-acid molecule which is a nucleic-acid molecule encoding 
a polypeptide specific to enterohemorrhagic pathogenic PL coli 
0-157:H7 and encodes 

(a) an amino acid sequence selected from a group 

1395 comprising the following SEQ IDs or a fragment thereof, SEQ 
ID NO: 2, SEQ ID NO : 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID 
NO: 6, SEQ ID NO: 7, SEQ. ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 





10, 


SEQ ID 


NO 


: 11 


SEQ. ID NO: 12, SEQ ID NO: 


13, 


SEQ ID 




NO 


14, 


SEQ 


ID 


NO 


15, SEQ IDNO: 


16, SEQ ID NO 


17, 


SEQ ID 


1400 


NO 


18, 


SEQ 


ID 


NO 


19, SEQ ID NO 


:20, SEQ ID NO 


21, 


SEQ ID 




NO 


22, 


SEQ 


ID 


NO 


23, 


SEQ ID NO 


:24, SEQ ID NO 


:25 


SEQ ID 




NO 


26, 


SEQ 


ID 


NO 


27, 


SEQ ID NO 


28, SEQ ID NO 


29, 


SEQ ID 




NO 


30, 


SEQ 


ID 


NO 


31, 


SEQ ID NO 


32, SEQ ID NO 


33, 


SEQ ID 




NO 


34, 


SEQ 


ID 


NO 




SEQ ID NO 


36, SEQ ID NO 




SEQ. ID 


1405 


NO 


38, 


SEQ 


ID 


NO 


39, 


SEQ IDNO: 


40, SEQ ID NO 


41, 


SEQ. ID 




NO 


42, 


SEQ 


ID 


NO 


43, 


SEQ ID NO 


:44, SEQ ID NO 


45, 


SEQ ID 




NO 


46, 


SEQ 


ID 


NO 


47, 


SEQ ID NO 


:48, SEQ ID NO 


: 49 


SEQ ID 




NO 


50, 


SEQ 


ID 


NO 


51, 


SEQ ID NO 


52, SEQ. ID NO 


53, 


SEQ ID 




NO 


54, 


SEQ 


ID 


NO 




SEQ ID NO 


56, SEQ ID NO 


57, 
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1 4 1 0 


NO 


58, 


SEQ 


ID 


NO 


59, 


SEQ ID NO 


60, SEQ ID NO 


61, 


SEQ ID 




NO 


62, 


SEQ 


ID 


NO 


63, 


SEQ IDNO: 


64, SEQ ID NO 


65, 


SEQ ID 




NO 


66, 


SEQ 


ID 


NO 


67, 


SEQ ID NO 


:68, SEQID NO 


69, 


SEQ ID 




NO 


70, SEQ 


ID 


NO 


71, 
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: 72, SEQ ID NO 


: 73, SEQ ID 




NO 
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ID 


NO 
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SEQ ID NO 


76, SEQ. ID NO 


77, SEQ ID 


1415 


NO 
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ID 


NO 


79, 


SEQ ID NO 


80, SEQ. ID NO 


81, SEQ ID 




NO 
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ID 


NO 


83, 


SEQ ID NO 


84, SEQ. ID NO 


85, 


SEQ ID 
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NO: 86, SEQ ID NO : 87, SEQ IDNO : 88, SEQ ID NO: 89, SEQ ID 
NO: 90, SEQ ID NO: 91, SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID 
NO: 94, SEQ ID NO : 95, SEQ ID NO : 96, SEQ ID NO: 97, SEQ ID 

1420 NO: 98, SEQ ID NO : 99, SEQ. ID NO: 100, SEQ ID NO: 101, SEQ 
ID NO: 102, SEQ ID NO : 103, SEQ ID NO: 104, SEQ ID NO: 105, 
SEQ ID NO: 106, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 
109, SEQ ID NO: 110, SEQ ID NO: 111, SEQ. ID NO: 112, SEQ. ID 
NO: 113, SEQ ID NO: 114, SEQ ID NO: 115, SEQ ID NO: 116, 

1425 SEQ ID NO: 117, SEQ ID NO: 118, SEQ ID NO: 119, SEQ ID NO: 
120, SEQ ID NO: 121, SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID 
NO: 124, SEQ ID NO: 125, SEQ. ID NO: 126, SEQ ID NO: 127, 
SEQ ID NO: 128, SEQ ID NO : 129, SEQ ID NO: 130, SEQ ID NO: 
131, SEQ ID NO : 133, SEQ ID NO : 134, SEQ ID NO: 135, SEQ ID 

1430 NO: 136, SEQ ID N 0:137, SEQ ID NO: 138, SEQ ID NO: 139, 
SEQ ID NO: 140, SEQ, ID NO: 141, SEQ, ID NO: 142, SEQ ID NO: 
143, SEQ ID NO: 144, SEQ ID NO: 145, SEQ ID NO: 146, SEQ ID 
NO: 147, SEQ ID NO : 148, SEQ ID NO: 149, SEQ ID NO: 150, 
SEQ ID NO: 151, SEQ ID NO : 152, SEQ ID NO : 153, SEQ ID NO: 

1435 154, SEQ, ID NO : 155, SEQ. ID NO: 156, SEQ, ID NO: 157, SEQ ID 
NO: 158, SEQ ID NO: 159, SEQ. ID NO: 160, SEQ ID NO: 161, 
SEQ ID NO: 162, SEQ ID NO: 163, SEQ ID NO: 164, SEQ ID NO: 
165, SEQ ID NO: 166, SEQ ID NO: 167, SEQ ID NO: 168, SEQ ID 
NO: 169, SEQ ID NO: 170, SEQ ID NO : 171, SEQ ID NO: 172, 

1440 SEQ ID NO: 173, SEQ. ID NO: 174, SEQ. ID NO: 175, SEQ ID NO: 
176, SEQ ID NO: 177, SEQ ID NO: 178, SEQ ID NO: 179, SEQ ID 
NO: 180, SEQ ID NO: 181, SEQ ID NO: 182, SEQ ID NO: 183, 
SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO: 
187, SEQ ID NO: 188, SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID 

1445 NO: 191, SEQ ID NO: 192, SEQ ID NO: 193, SEQ ID NO: 194, 
SEQ ID NO: 195, SEQ, ID NO: 196, SEQ. ID NO: 197, SEQ ID NO: 
198, SEQ ID NO: 199, SEQ ID NO:200, SEQ ID NO:201, SEQ ID 
NO: 202, SEQ ID NO: 203, SEQ. ID NO : 204, SEQ ID NO : 205, 
SEQ ID NO: 206, SEQ ID NO: 207, SEQ ID NO: 208, SEQ ID NO: 

1450 209, SEQ. ID NO: 210, SEQ. ID NO: 211, SEQ ID NO: 212, SEQ ID 
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NO: 213, SEQ ID NO: 214, SEQ ID NO : 215, SEQ ID NO: 216, 
SEQ ID NO: 217, SEQ ID NO : 218, SEQ ID NO: 219, SEQ ID NO: 
220, SEQ ID NO: 221, SEQ ID NO: 222, SEQ ID NO: 223, SEQ ID 
NO: 224, SEQ ID NO: 225, SEQ ID NO: 228, SEQ ID NO: 227, 

1455 SEQ ID NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 
231, SEQ ID NO: 2 32, SEQ ID NO: 2 33, SEQ ID NO: 2 34, SEQ ID 
NO: 235, SEQ ID NO : 236, SEQ ID NO: 237, SEQ ID NO: 238, 
SEQ ID NO: 239, SEQ ID NO: 240, SEQ ID NO: 241, SEQ ID NO: 
242, SEQ ID NO: 243, SEQ ID NO: 245, SEQ ID NO: 246, SEQ ID 

1460 NO: 247, SEQ ID NO : 248, SEQ ID NO: 249, SEQ ID NO: 250, 
SEQ ID NO: 251, SEQ ID NO : 252, SEQ ID NO: 253, SEQ ID NO: 
254, SEQ ID NO: 255, SEQ ID NO: 256, SEQ ID NO: 257, SEQ ID 
NO: 258, SEQ ID NO: 259, SEQ ID NO: 260, SEQ ID NO: 261, 
SEQ ID NO: 262, SEQ ID NO: 263, SEQ ID NO: 264, SEQ ID NO: 

1465 265, SEQ ID NO: 266, SEQ ID NO: 267, SEQ ID NO: 268, SEQ ID 
NO: 269, SEQ ID NO: 270, SEQ ID NO: 271, SEQ ID NO: 272, 
SEQ ID NO: 2 73, SEQ ID NO: 274, SEQ ID NO: 275, SEQ ID NO: 
276, SEQ ID NO: 277, SEQ ID NO: 278, SEQ ID NO: 279, SEQ ID 
NO: 280, SEQ ID NO: 281, SEQ ID NO: 282, SEQ ID NO: 283, 

1470 SEQ ID NO:284, SEQ ID NO:285, SEQ ID NO: 286, SEQ ID NO: 
287, SEQ ID NO: 288, SEQ ID NO: 289, SEQ ID NO: 290, SEQ ID 
NO: 291, SEQ ID NO: 292, SEQ ID NO: 293, SEQ ID NO: 294, 
SEQ ID NO: 295, SEQ ID NO: 296, SEQ ID NO: 297, SEQ ID NO: 
298, SEQ ID NO: 299, SEQ ID NO: 300, SEQ ID NO: 301, SEQ ID 

1475 NO: 302, SEQ ID NO: 303, SEQ ID NO : 304, SEQ ID NO : 305, 
SEQ ID NO: 306, SEQ ID NO: 307, SEQ ID NO : 308, SEQ ID NO: 
309, SEQ. ID NO: 310, SEQ. ID NO : 311, SEQ ID NO: 312, SEQ ID 
NO: 313, SEQ ID NO : 314, SEQ ID NO: 315, SEQ ID NO: 316, 
SEQ ID NO: 317, SEQ ID NO: 318, SEQ ID NO: 319, SEQ ID NO: 

1480 320, SEQ ID NO: 321, SEQ ID NO: 322, SEQ ID NO: 323, SEQ ID 
NO: 324, SEQ ID NO : 325, SEQ ID NO: 326, SEQ ID NO: 327, 
SEQ ID NO: 328, SEQ ID NO: 329, SEQ ID NO: 330, SEQ ID NO: 
331, SEQ. ID NO: 332, SEQ. ID NO: 333, SEQ. ID NO: 334, SEQ ID 
NO: 335, SEQ ID NO : 336, SEQ. ID NO : 338, SEQ ID NO : 339, 
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1485 SEQ ID NO: 340, SEQ. ID NO: 341, SEQ ID NO: 342, SEQ ID NO: 
343, SEQ ID NO: 344, SEQ ID NO: 345, SEQ ID NO: 346, SEQ ID 
NO: 347, SEQ ID NO : 348, SEQ ID NO: 349, SEQ ID NO : 350, 
SEQ ID NO: 351, SEQ ID NO: 352, SEQ ID NO: 353, SEQ ID NO: 
354, SEQ ID NO: 355, SEQ ID NO: 356, SEQ ID NO: 357, SEQ ID 

1490 NO: 358, SEQ ID NO: 359, SEQ ID NO: 360, SEQ ID NO : 361, 
SEQ ID NO: 362, SEQ ID NO: 363, SEQ ID NO : 364, SEQ ID NO: 
365, SEQ ID NO: 366, SEQ ID NO: 367, SEQ ID NO: 368, SEQ ID 
NO: 369, SEQ ID NO: 370, SEQ ID NO: 371, SEQ ID N 0:372, 
SEQ ID NO: 373, SEQ ID NO: 374, SEQ ID NO: 375, SEQ ID NO: 

1495 376, SEQ ID NO: 377, SEQ. ID NO: 378, SEQ ID NO: 379, SEQ ID 
NO: 380, SEQ ID NO: 381, SEQ ID NO: 382, SEQ ID NO : 383, 
SEQ ID NO: 384, SEQ ID NO: 385, SEQ ID NO: 386, SEQ ID NO: 
387, SEQ ID NO: 388, SEQ ID NO: 389, SEQ ID NO: 390, SEQ ID 
NO: 391, SEQ ID NO: 392, SEQ ID NO: 393, SEQ ID NO : 394, 

1500 SEQ ID NO: 395, SEQ ID NO: 396, SEQ ID NO: 397, SEQ ID NO: 
398, SEQ ID NO: 399, SEQ ID NO: 400, SEQ ID NO: 401, SEQ ID 
NO: 402, SEQ ID NO : 403, SEQ ID NO : 404, SEQ ID NO: 405, 
SEQ ID NO: 406, SEQ ID NO: 407, SEQ ID NO: 408, SEQ ID NO: 
409, SEQ ID NO: 411, SEQ ID NO: 412, SEQ ID NO: 413, SEQ ID 

1505 NO: 414, SEQ ID NO: 415, SEQ ID NO: 416, SEQ ID NO: 417, 
SEQ ID NO: 418, SEQ ID NO: 419, SEQ ID NO: 420, SEQ ID NO: 
421, SEQ ID NO: 422, SEQ ID NO: 423, SEQ ID NO: 424, SEQ ID 
NO: 425, SEQ ID NO: 426, SEQ ID NO: 427, SEQ ID NO: 428, 
SEQ ID NO: 42 9, SEQ ID NO : 430, SEQ ID NO: 431, SEQ ID NO: 

1510 432, SEQ ID NO: 433, SEQ ID NO : 434, SEQ ID NO: 435, SEQ 
ID NO: 436, SEQ ID NO: 437, SEQ ID NO: 438, SEQ ID NO: 439, 
SEQ ID NO: 440, SEQ ID NO: 441, SEQ ID NO: 442, SEQ ID NO: 
443, SEQ ID NO: 444, SEQ ID NO : 445, SEQ ID NO: 446, SEQ ID 
NO: 447, SEQ ID NO: 448, SEQ ID NO: 449, SEQ ID NO : 450, 

1515 SEQ ID NO: 451, SEQ. ID NO:452, SEQ ID NO:453, SEQ ID NO: 
454, SEQ ID NO: 455, SEQ. ID NO: 456, SEQ. ID NO: 457, SEQ ID 
NO: 458, SEQ ID NO: 459, SEQ. ID NO: 460, SEQ ID NO: 461, 
SEQ ID NO: 462, SEQ ID NO: 463, SEQ ID NO: 464, SEQ ID NO: 
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465, SEQ ID NO: 466, SEQ ID NO: 467, SEQ ID NO: 468, SEQ ID 

1520 NO: 469, SEQ ID NO: 470, SEQ ID NO: 471, SEQ ID NO: 472, 
SEQ ID NO: 473, SEQ ID NO: 474, SEQ ID NO: 475, SEQ ID NO: 
476, SEQ. ID NO: 477, SEQ. ID NO: 478, SEQ ID NO: 479, SEQ ID 
NO: 480, SEQ ID N 0:481, SEQ ID NO: 482, SEQ ID NO : 483, 
SEQ ID NO: 485, SEQ ID NO : 486, SEQ ID NO: 487, SEQ ID NO: 

1525 488, SEQ ID NO: 489, SEQ ID NO:490, SEQ ID NO:491, SEQ ID 
NO: 492, SEQ ID NO: 493, SEQ ID NO : 494, SEQ ID NO: 495, 
SEQ ID NO: 496, SEQ ID NO: 497, SEQ ID NO: 4 98, SEQ ID NO: 
499, SEQ ID NO: 500, SEQ ID NO: 501, SEQ ID NO: 502, SEQ ID 
NO: 503, SEQ ID NO: 504, SEQ. ID NO: 505, SEQ ID NO : 506, 

1530 SEQ ID NO: 507, SEQ ID NO: 508, SEQ ID NO: 509, SEQ ID NO: 
510, S E Q I D N 0 : 5 1 1 , S E Q I D N O : 5 1 2 , S E Q 1 D N 0 : 5 1 3 , S E Q I D 
NO: 514, SEQ ID NO: 515, SEQ ID NO: 516, SEQ ID NO: 517, 
SEQ ID NO: 518, SEQ ID NO: 519, SEQ ID NO: 520, SEQ ID NO: 
521, SEQ ID NO:522, SEQ ID NO:523, SEQ ID NO:524, SEQ ID 

1535 NO: 525, SEQ ID NO: 526, SEQ ID NO: 527, SEQ ID NO: 528, 
SEQ ID NO: 529, SEQ ID NO: 530, SEQ ID NO: 531, SEQ ID NO: 
532, SEQ ID NO: 533, SEQ. ID NO : 534, SEQ ID NO: 535, SEQ ID 
NO: 536, SEQ ID NO: 537, SEQ ID NO : 538, SEQ ID NO : 539, 
SEQ ID NO : 540, SEQ ID NO : 541, SEQ ID NO: 542, SEQ ID NO: 

1540 543, SEQ ID NO: 544, SEQ ID NO: 545, SEQ ID NO: 546, SEQ ID 
NO: 547, SEQ ID NO : 548, SEQ ID NO: 549, SEQ ID NO : 550, 
SEQ ID NO: 551, SEQ. ID NO : 552, SEQ ID NO: 553, SEQ ID NO: 
555, SEQ ID NO: 556, SEQ ID NO: 557, SEQ ID NO: 558, SEQ ID 
NO: 559, SEQ ID NO: 560, SEQ. ID NO: 561, SEQ ID NO: 562, 

1545 SEQ ID NO: 563, SEQ ID NO: 564, SEQ ID NO: 565, SEQ ID NO: 
566, SEQ ID NO: 567, SEQ ID NO: 568, SEQ ID NO: 569, SEQ ID 
NO: 570, SEQ ID NO: 571, SEQ ID NO: 572, SEQ ID NO: 573, 
SEQ ID NO: 574, SEQ. ID NO: 575, SEQ ID NO: 576, SEQ ID NO: 
577, SEQ ID NO: 578, SEQ ID NO: 579, SEQ ID NO: 580, SEQ ID 

1550 NO: 581, SEQ ID NO: 582, SEQ ID NO: 583, SEQ ID NO : 584, 
SEQ ID NO: 585, SEQ ID NO: 586, SEQ ID NO: 587, SEQ ID NO: 
588, SEQ. ID NO: 589, SEQ. ID NO: 590, SEQ ID NO: 591, SEQ ID 
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NO: 592, SEQ ID NO : 593, SEQ ID NO : 594, SEQ ID NO : 595, 
SEQ ID NO: 596, SEQ ID NO: 597, SEQ ID NO: 598, SEQ ID NO: 

1555 599, SEQ ID NO : 600, SEQ ID NO: 601, SEQ ID NO: 602, SEQ ID 
NO: 603, SEQ ID NO: 604, SEQ. ID NO: 605, SEQ ID NO : 606, 
SEQ ID NO: 607, SEQ ID NO: 608, SEQ ID NO: 609, SEQ ID NO: 
610, S E Q I D NO: 6 1 1 , S E Q I D N 0:612, S E Q I D N 0 : 6 1 3 , S E Q I D 
NO: 614, SEQ ID NO: 615, SEQ ID NO: 616, SEQ ID NO: 617, 

1560 SEQ ID NO: 618, SEQ ID NO: 619, SEQ ID NO: 620, SEQ ID NO: 
621, SEQ ID NO: 622, SEQ ID NO: 623, SEQ ID NO: 624, SEQ ID 
NO: 625, SEQ ID NO: 626, SEQ ID NO: 627, SEQ ID NO: 628, 
SEQ ID NO : 62 9. SEQ ID NO: 631, SEQ ID NO: 632, SEQ ID NO: 
633, SEQ ID NO: 634, SEQ ID NO: 635, SEQ ID NO: 636, SEQ ID 

1565 NO: 637, SEQ ID NO: 638, SEQ ID NO: 639, SEQ ID NO : 640, 
SEQ ID NO: 641, SEQ ID NO: 642, SEQ ID NO: 643, SEQ ID NO: 
644, SEQ ID NO : 645. SEQ ID NO: 646, SEQ ID NO: 647, SEQ ID 
NO: 648, SEQ ID NO : 649, SEQ ID NO: 650, SEQ ID NO: 651, 
SEQ ID NO: 652, SEQ ID NO : 653, SEQ ID NO: 654, SEQ ID NO: 

1570 655, SEQ ID NO: 656, SEQ ID NO: 657, SEQ ID NO: 658, SEQ ID 
NO: 659, SEQ ID NO: 660, SEQ ID NO: 661, SEQ ID NO : 662, 
SEQ ID NO: 663, SEQ ID NO: 664, SEQ ID NO: 665, SEQ ID NO: 
666, SEQ ID NO: 667, SEQ ID NO: 668, SEQ ID NO: 669, SEQ ID 
NO: 670, SEQ ID NO: 671, SEQ ID NO: 672, SEQ ID NO : 673, 

1 575 SEQ ID NO: 674, SEQ ID NO: 675, SEQ ID NO: 676, SEQ ID NO: 
677, SEQ ID NO : 678, SEQ ID NO: 679, SEQ ID NO: 680, SEQ ID 
NO: 681, SEQ ID NO: 682, SEQ ID NO: 683, SEQ ID NO: 684, 
SEQ ID NO: 685, SEQ ID NO: 686, SEQ ID NO: 687, SEQ ID NO: 
688, SEQ ID NO: 690, SEQ ID NO : 691, SEQ ID NO: 692, SEQ ID 

1580 NO: 693, SEQ ID NO: 694, SEQ ID NO: 695, SEQ ID NO : 696, 
SEQ ID NO: 697, SEQ ID NO: 698, SEQ ID NO: 699, SEQ ID N 
0:700, SEQ ID NO: 701, SEQ. ID NO: 702, SEQ ID NO: 703, SEQ 
ID NO: 704, SEQ ID NO: 705, SEQ ID NO: 706, SEQ. ID NO: 707, 
SEQ ID NO: 708, SEQ ID NO: 709, SEQ ID NO: 710, SEQ ID NO: 

1585 711, SEQ ID NO: 712, SEQ ID NO: 713, SEQ ID NO: 714, SEQ ID 
NO: 715, SEQ ID NO: 716, SEQ. ID NO: 717, SEQ ID NO: 718, 
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SEQ ID NO: 719, SEQ ID NO : 720, SEQ ID NO: 721, SEQ ID NO: 
722, SEQ ID NO: 723, SEQ ID NO: 724, SEQ ID NO: 725, SEQ ID 
NO: 726, SEQ ID NO: 727, SEQ ID NO: 728, SEQ ID NO: 729, 

1590 SEQ ID NO : 730. SEQ ID NO: 731, SEQ ID NO: 732, SEQ ID NO: 
733, SEQ ID NO: 734, SEQ ID NO: 735, SEQ ID NO: 736, SEQ ID 
NO: 737, SEQ ID NO: 738, SEQ ID NO: 739, SEQ ID NO : 740, 
SEQ ID NO: 741, SEQ ID NO: 742, SEQ ID NO: 743, SEQ ID NO: 
744, SEQ ID NO: 745, SEQ ID NO:746, SEQ ID NO:747, SEQ ID 

1595 NO: 748, SEQ ID NO: 749, SEQ ID NO: 750, SEQ ID NO: 751, 
SEQ ID NO: 752, SEQ ID NO: 753, SEQ ID NO: 754, SEQ ID NO: 
756, SEQ ID NO: 757, SEQ. ID NO: 758, SEQ ID NO: 759, SEQ ID 
NO: 760, SEQ ID NO: 761, SEQ ID NO: 762, SEQ ID NO : 763, 
SEQ ID NO: 764, SEQ ID NO: 765, SEQ ID NO: 766, SEQ ID NO: 

1600 767, SEQ ID NO: 768, SEQ ID NO: 769, SEQ ID NO: 770, SEQ ID 
NO: 771, SEQ ID NO: 772, SEQ ID NO: 773, SEQ ID NO: 774, 
SEQ ID NO: 775, SEQ ID NO: 776, SEQ ID NO: 777, SEQ ID NO: 
778, SEQ ID NO: 779, SEQ ID NO: 780, SEQ ID NO: 781, SEQ ID 
NO: 782, SEQ ID NO : 783, SEQ ID NO : 784, SEQ ID NO : 785, 

1605 SEQ ID NO: 786, SEQ ID NO: 787, SEQ ID NO: 788, SEQ ID NO: 
789, SEQ ID NO: 790, SEQ. ID NO: 791, SEQ. ID NO: 792, SEQ ID 
NO: 793, SEQ ID NO: 794, SEQ ID NO: 795, SEQ ID NO : 796, 
SEQ ID NO: 797, SEQ ID NO: 798, SEQ ID NO: 799, SEQ ID NO: 
800, SEQ ID NO: 801, SEQ ID NO: 802, SEQ ID NO: 803, SEQ ID 

1610 NO: 804, SEQ ID NO : 805, SEQ ID NO : 806, SEQ ID NO: 807, 
SEQ ID NO: 808, SEQ ID NO : 809, SEQ ID NO: 810, SEQ ID NO: 
811, SEQ ID NO: 812, SEQ ID NO: 813, SEQ ID NO: 814, SEQ ID 
NO: 815, SEQ ID NO: 817, SEQ ID NO: 818, SEQ ID NO: 819, 
SEQ ID NO: 820, SEQ ID NO: 821, SEQ ID NO:822, SEQ ID NO: 

1615 823, SEQ ID NO: 824, SEQ ID NO: 825, SEQ ID NO: 826, SEQ ID 
NO: 827, SEQ ID NO: 828, SEQ ID NO: 829, SEQ ID NO : 830, 
SEQ ID NO: 831, SEQ. ID NO: 832, SEQ ID NO: 833, SEQ ID NO: 
834, SEQ ID NO: 835, SEQ. ID NO: 836, SEQ ID NO: 837, SEQ ID 
NO: 838, SEQ ID NO : 839, SEQ. ID NO: 840, SEQ ID NO: 841, 

1620 SEQ ID NO: 842, SEQ ID NO: 843, SEQ ID NO : 844, SEQ ID NO: 
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845, SEQ ID NO: 846, SEQ ID NO: 847, SEQ ID NO: 848, SEQ ID 
NO: 849, SEQ ID NO: 850, SEQ ID NO: 851, SEQ ID NO: 852, 
SEQ ID NO: 853, SEQ ID NO: 854, SEQ ID NO: 855, SEQ ID NO: 
856, SEQ. ID NO: 857, SEQ. ID NO: 858, SEQ ID NO: 859, SEQ ID 

1625 NO: 860, SEQ ID NO : 861, SEQ ID NO: 862, SEQ ID NO : 863, 
SEQ ID NO: 864, SEQ ID NO: 865, SEQ ID NO : 866, SEQ ID NO: 
867, SEQ ID NO: 868, SEQ ID NO: 869, SEQ ID NO: 870, SEQ ID 
NO: 871, SEQ ID NO: 872, SEQ ID NO: 873, SEQ ID NO : 874, 
SEQ ID NO: 875, SEQ ID NO: 877, SEQ ID NO: 878, SEQ ID NO: 

1630 879, SEQ ID NO : 880, SEQ ID NO: 881, SEQ ID NO: 882, SEQ ID 
NO: 883, SEQ ID NO : 884, SEQ. ID NO: 885, SEQ ID NO : 886, 
SEQ ID NO: 887, SEQ ID NO: 888, SEQ ID NO: 889, SEQ ID NO: 
890, SEQ ID NO: 891, SEQ ID NO: 892, SEQ ID NO: 893, SEQ ID 
NO: 894, SEQ ID NO : 895, SEQ ID NO : 896, SEQ ID NO: 897, 

1635 SEQ ID NO: 898, SEQ ID NO: 899, SEQ ID NO: 900, SEQ ID NO: 
901, SEQ ID NO: 902, SEQ ID NO : 903, SEQ ID NO: 904, SEQ ID 
NO: 905, SEQ ID NO: 906, SEQ ID NO: 907, SEQ ID NO : 908, 
SEQ ID NO: 909, SEQ ID NO : 910, SEQ ID NO : 911, SEQ ID NO: 
912, SEQ ID NO : 913, SEQ. ID NO: 914, SEQ ID NO: 915, SEQ ID 

1640 NO: 916, SEQ ID NO: 917, SEQ ID NO: 918, SEQ ID NO: 919, 
SEQ ID NO: 920, SEQ ID NO: 921, SEQ ID NO:922, SEQ ID NO: 
92 3, SEQ ID NO: 92 4, SEQ ID NO: 92 5, SEQ ID NO: 926, SEQ ID 
NO: 928, SEQ ID NO : 929, SEQ ID NO: 930, SEQ ID NO: 931, 
SEQ ID NO: 932, SEQ. ID NO: 933, SEQ ID NO : 934, SEQ ID NO: 

1645 935, SEQ ID NO: 936, SEQ ID NO: 937, SEQ ID NO: 938, SEQ ID 
NO: 939, SEQ ID NO : 940, SEQ ID NO: 941, SEQ ID NO: 942, 
SEQ ID NO: 943, SEQ ID NO : 944, SEQ ID NO: 945, SEQ ID NO: 
946, SEQ ID NO: 94 7, SEQ ID NO: 94 8, SEQ ID NO: 949, SEQ ID 
NO: 950, SEQ ID NO : 951, SEQ ID NO: 952, SEQ ID NO : 953, 

1650 SEQ ID NO: 954, SEQ. ID NO: 955, SEQ. ID NO : 956, SEQ ID NO: 
957, SEQ ID NO: 958, SEQ ID NO: 959, SEQ ID NO: 960, SEQ ID 
NO: 961, SEQ ID NO: 962, SEQ ID NO: 963, SEQ ID NO: 964, SE 
Q ID NO: 965, SEQ. ID NO: 966, SEQ ID NO: 967, SEQ ID NO: 
968, SEQ. ID NO: 969, SEQ. ID NO: 970, SEQ ID NO: 971, SEQ ID 
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1 655 NO: 972, SEQ ID NO : 973, SEQ ID NO: 974, SEQ ID NO: 975, 
SEQ ID NO: 976, SEQ ID NO: 977, SEQ ID NO: 979, SEQ ID NO: 
980, SEQ ID NO: 981, SEQ ID NO: 982, SEQ ID NO: 983, SEQ ID 
NO: 984, SEQ ID NO : 985, SEQ ID NO : 988, SEQ ID NO : 987, 
SEQ ID NO: 988, SEQ ID NO: 989, SEQ ID NO: 990, SEQ ID NO: 

1660 991, SEQ ID NO: 992, SEQ ID NO: 993, SEQ ID NO: 994, SEQ ID 
NO: 995, SEQ ID NO: 996, SEQ ID NO: 997, SEQ ID NO : 998, 
SEQ ID NO: 999, SEQ ID NO: 1000, SEQ ID NO: 1001, SEQ ID 
NO: 1002, SEQ ID NO : 1.003, SEQ ID NO: 1004, SEQ ID NO: 1005, 
SEQID NO: 1006, SEQ ID NO: 1007, SEQ ID NO: 1008, SEQ ID 

1665 NO: 1009, SEQ IDNO: 1010, SEQ ID NO: 1011, SEQ ID NO: 1012, 
SEQ ID NO: 1014, SEQ ID NO: 1015, SEQ ID NO: 1016, SEQ ID 
NO: 1017, SEQ ID NO: 1018, SEQ ID NO : 1019, SEQ ID NO: 1020, 
SEQ ID NO: 1021, SEQ ID NO: 102 2, SEQ ID NO: 1023, SEQ ID 
NO: 1024, SEQ ID NO: 1025, SEQ ID NO: 1026, SEQ. ID NO: 1027, 

1670 SEQ IDNO: 1028, SEQ. ID NO: 1030, SEQ ID NO: 1031, SEQ ID 
NO: 1032, SEQ ID NO: 1033, SEQ ID NO: 1034, SEQ ID NO: 1035, 
SEQ ID NO: 1036, SEQ ID NO : 1037, SEQ ID NO: 1038, SEQ ID 
NO: 1039, SEQ ID NO: 1040, SEQ ID NO: 1041, SEQ ID NO: 1042, 
SEQ ID NO: 1043, SEQ ID NO: 1044, SEQ ID NO: 1045, SEQID 

1675 NO : 1046, SEQ. ID NO : 1047, SEQ ID NO : 1048, SEQ. ID NO : 1049, 
SEQ ID N 0:1 050, SEQ ID NO: 1051, SEQ ID NO: 1052, SEQ ID 
NO: 1053, SEQ ID NO: 1054, SEQ ID NO: 1056, SEQ. ID NO: 1057, 
SEQ ID NO: 1058, SEQ ID NO: 1059, SEQ ID NO: 1061, SEQ ID 
NO: 1082, SEQ ID NO: 1063, SEQ ID NO: 1064, SEQID NO: 1085, 

1680 SEQ ID NO: 1066, SEQ ID NO: 1067, SEQ ID NO : 1068, SEQ 
IDNO: 1069, SEQ. ID NO: 1070, SEQ ID NO: 1071, SEQ ID NO: 
1072, SEQ ID NO: 1073, SEQ ID NO: 1074, SEQ ID NO: 1075, 
SEQ ID NO: 1076, SEQ ID NO: 1077, SEQ ID NO: 1078, SEQ ID 
NO: 1079, SEQ ID NO : 1080, SEQ ID NO: 1081, SEQ. ID NO: 1082, 

1685 SEQ ID NO: 1083, SEQ. ID NO: 1084, SEQ. ID NO : 1085, SEQ 
IDNO: 1086, SEQ ID NO: 1087, SEQ ID NO: 1088, SEQ ID NO: 
1089, SEQ ID NO: 1090, SEQ. ID NO: 1091, SEQ ID NO: 1092, 
SEQ ID NO: 1094, SEQ ID NO: 1095, SEQ. ID NO: 1096, SEQ ID 
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NO: 1097, SEQ ID NO: 1098, SEQ ID NO : 1099, SEQ ID NO: 1100, 

1690 SEQ ID NO: 1101, SEQ ID NO: 1102, SEQ ID NO : 1103, SEQID 
NO: 1104, SEQ ID NO: 1105, SEQ ID NO: 1106, SEQ ID NO: 1107, 
SEQ ID NO: 1108, SEQ ID NO: 1109, SEQ. ID NO: 1110, SEQ. ID 
NO: 1111, SEQ ID NO : 1112, SEQ ID NO: 1113, SEQ ID NO: 1114, 
S E Q I D N 0 : 1 1 1 5 , SEQ I D NO: 1116, S E Q I D N 0 : 1 117, S E Q I D 

1695 NO : 1118, SEQ ID NO: 1119, SEQ ID NO : 1120, SEQID NO: 1121, 
SEQ ID NO : 1122, SEQ ID NO : 1123, SEQ ID NO : 1124, SEQ 
IDNO: 1125, SEQ ID NO: 1126, SEQ ID NO: 1.127, SEQ ID NO: 
1129, SEQ ID NO : 1130, SEQ ID NO: 1131, SEQ ID NO: 1132, 
SEQ ID NO: 1133, SEQ ID NO: 1134, SEQ. ID NO: 1135, SEQ ID 

1700 NO: 1136, SEQ ID NO: 1137, SEQ ID NO: 1138, SEQ ID NO: 1139, 
SEQ ID NO : 1.140, SEQ ID NO: 1141, SEQ ID NO : 1142, SEQ 
IDNO : 1143, SEQ ID NO: 1144, SEQ ID NO : 1145, SEQ ID NO: 
1146, SEQ ID NO: 1147, SEQ ID NO: 1148, SEQ ID NO: 1149, 
SEQ ID NO: 1150, SEQ ID NO: 1151, SEQ ID NO: 1152, SEQ ID 

1705 NO: 1153, SEQ ID NO: 1154, SEQ ID NO: 11 55, SEQ ID NO: 1156, 
S E Q I D N 0 : 1 1 5 8 , S E Q I D N 0 : 1 1 5 9 , S E Q 1 1) N 0 : 1160, S E Q I D 
NO: 1161, SEQ ID NO: 1162, SEQ ID NO: 1163, SEQ ID NO: 1164, 
SEQ ID NO: 1165, SEQ. ID NO : 1166, SEQ. ID NO: 1167, SEQ ID 
NO: 1168, SEQ ID NO : 1169, SEQ ID NO: 1170, SEQ ID NO: 1171, 

1710 SEQ ID NO: 1172, SEQ ID NO: 1173, SEQ ID NO: 1174, SEQ ID 
NO: 1175, SEQ ID NO: 1176, SEQ ID NO: 1177, SEQID NO: 1178, 
SEQ ID NO: 1179, SEQ. ID NO: 1180, SEQ. ID NO : 1181, SEQ 
ID N 0:1182, SEQ ID NO : 1 183, SEQ ID NO: 1184, SEQ ID NO: 
1185, SEQ. ID NO: 1186, SEQ ID NO: 1187, SEQ ID NO: 1188, 

1715 SEQ ID NO: 1189, SEQ ID NO: 1190, SEQ ID NO: 1192, SEQ ID 
NO: 1193, SEQ ID NO: 1194, SEQ ID NO: 1195, SEQ ID NO: 1196, 
SEQ ID NO: 1197, SEQ ID NO : 1198, SEQ ID NO : 1199, SEQ 
IDNO: 1200, SEQ ID NO: 1201, SEQ ID NO: 1202, SEQ. ID NO: 
1203, SEQ ID NO: 1204, SEQ ID NO: 1205, SEQ. ID NO: 1206, 

1720 SEQ ID NO: 1207, SEQ ID NO: 1208, SEQ ID NO: 1209, SEQ ID 
NO: 1210, SEQ ID NO: 1211, SEQ ID NO: 1213, SEQ ID NO: 1214, 
SEQ ID NO: 1215, SEQ ID NO: 1216, SEQ ID NO: 1217, SEQID 
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NO : 1218, SEQ ID NO : 1219, SEQ ID NO : 1220, SEQ ID NO : 1221, 
SEQ ID NO: 1222, SEQ ID NO: 1223, SEQ ID NO: 1224, SEQ ID 

1725 NO: 1225, SEQ ID NO: 1226, SEQ ID NO : 1227, SEQ ID NO: 1228, 
SEQ ID NO: 1229, SEQ ID NO: 1230, SEQ ID NO: 1231, SEQ ID 
NO: 1232, SEQ ID NO: 1233, SEQ ID NO: 1234, SEQII) NO: 1235, 
SEQ ID NO: 1236, SEQ ID NO: 1237, SEQ ID NO : 1238, SEQ 
IDNO: 1239, SEQ ID NO: 1241, SEQ ID NO: 1242, SEQ ID NO: 

1730 1243, SEQ ID NO: 1244, SEQ ID NO: 1245, SEQ ID NO: 1246, 
SEQ ID NO: 1247, SEQ ID NO: 1248, SEQ ID NO: 1249, SEQ ID 
NO: 1250, SEQ ID NO: 1251, SEQ ID NO: 1252, SEQ ID NO: 1253, 
SEQ ID NO: 1254, SEQ ID NO: 1255, SEQ ID NO : 1256, SEQ 
ID N 0:1257, SEQ ID NO: 1259, SEQ ID NO: 1260, SEQ ID NO: 

1 735 1261, SEQ ID NO: 1262, SEQ ID NO: 1263, SEQ ID NO: 1264, 
SEQ ID NO: 1265, SEQ ID NO: 1266, SEQ ID NO: 1267, SEQ ID 
NO: 1268, SEQ ID NO: 1269, SEQ ID NO: 1270, SEQ ID NO: 1271, 
SEQ ID NO: 1272, SEQ ID NO : 1273, SEQ ID NO: 1275, SEQID 
NO: 1276, SEQ ID NO: 1277, SEQ ID NO: 1278, SEQ ID NO: 1279, 

1740 SEQ ID NO: 1280, SEQ ID NO : 1281, SEQ ID NO: 1282, SEQ ID 
NO: 1283, SEQ ID NO: 1284, SEQ ID NO: 1285, SEQ ID NO : 1286, 
SEQ ID NO: 1287, SEQ ID NO: 1289, SEQ ID NO: 1290, SEQ ID 
NO: 1291, SEQ. ID NO: 1292, SEQ ID NO: 1293, SEQID NO: 1294, 
SEQ ID NO: 1295, SEQ ID NO : 1296, SEQ ID NO: 1297, SEQ 

1745 IDNO: 1298, SEQ ID NO: 1299, SEQ ID NO: 1300, SEQ. ID NO: 
1301, SEQ ID NO: 1303, SEQ ID NO: 1304, SEQ ID NO: 1305, 
SEQ ID NO: 1306, SEQ ID NO: 1307, SEQ ID NO: 1308, SEQ ID 
NO: 1310, SEQ ID NO : 1311, SEQ ID NO: 1312, SEQ ID NO: 1313, 
SEQ ID NO: 1314, SEQ ID NO: 1315, SEQ ID NO: 1316, SEQ 

1 75 0 IDNO: 1317, SEQ ID NO : 1318, SEQ ID NO: 1319, SEQ ID NO: 
1320, SEQ ID NO: 1322, SEQ ID NO: 1323, SEQ ID NO: 1324, 
SEQ ID NO: 1325, SEQ ID NO: 1326, SEQ ID NO: 1327, SEQ ID 
NO: 1328, SEQ ID NO: 1330, SEQ ID NO: 1331, SEQ ID NO: 1332, 
SEQ ID NO: 1333, SEQ ID NO: 1334, SEQ ID NO: 1335, SEQID 

1 755 NO: 1336, SEQ. ID NO: 1337, SEQ. ID NO: 1339, SEQ ID NO: 1340, 
SEQ ID NO: 1341, SEQ ID NO: 1342, SEQ. ID NO: 1343, SEQ ID 
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NO: 1344, SEQ ID NO : 1345, SEQ ID NO: 1346, SEQ ID NO: 1347, 
S E Q I D N 0 : 1 3 4 9 , S E Q I D NO: 1350, S E Q [ D N 0 : 1 3 51, 8 E Q I D 
NO: 1352, SEQ ID NO : 1353, SEQ ID NO : 1 354, SEQID NO: 1355, 

1760 SEQ ID NO: 1356, SEQ ID NO : 1357, SEQ ID NO: 1358, SEQ 
IDNO:1360, SEQ ID NO: 1361, SEQ ID NO: 1362, SEQ ID NO: 
1363, SEQ ID NO: 1364, SEQ ID NO: 1365, SEQ ID NO: 1367, 
SEQ ID NO: 1368, SEQ ID NO : 1369, SEQ ID NO: 1370, SEQ ID 
NO: 1371, SEQ ID NO: 1375, SEQ ID NO: 1376, SEQ ID NO: 1377, 

1765 SEQ ID NO: 1378, SEQ ID NO: 1 379, SEQ ID NO: 1381, SEQ 
IDNO: 1382, SEQ ID NO : 1 383, SEQ ID NO: 1384, SEQ ID NO: 
1385, SEQ ID NO: 1387, SEQ. ID NO : 1388, SEQ ID NO: 1389, 
SEQ ID NO: 1390, SEQ ID NO: 1391, SEQ ID NO: 1392, SEQ ID 
NO: 1393, SEQ ID NO: 1395, SEQ ID NO : 1396, SEQ ID NO: 1397, 

1 770 SEQ ID NO: 1398, SEQ ID NO: 1399, SEQ ID NO: 1400, SEQID 
NO: 1402, SEQ ID NO: 1403, SEQ ID NO: 1404, SEQ. ID NO: 1405, 
SEQ ID NO: 1406, SEQ ID NO: 1407, SEQ ID NO: 1409, SEQ ID 
NO: 1410, SEQ ID NO: 1412, SEQ ID NO: 1413, SEQ, ID NO: 1414, 
SEQ ID NO: 1415, SEQ ID NO: 1416, SEQ ID NO: 1.4.17, SEQ ID 

1775 NO: 1419, SEQ ID NO: 1420, SEQ. ID NO: 1421, SEQID NO: 1422, 
SEQ ID NO: 1423, SEQ ID NO: 1424, SEQ ID NO: 1425, SEQ 
IDNO : 1427, SEQ ID NO: 1428, SEQ ID NO : 1429, SEQ ID NO: 
1430, SEQ ID NO: 1431, SEQ ID NO: 1432, SEQ ID NO: 1433, 
SEQ ID NO: 1434, SEQ ID NO: 1435, SEQ ID NO: 1437, SEQ ID 

1780 NO: 1438, SEQ ID NO: 1439, SEQ ID NO: 1440, SEQ ID NO: 1441, 
SEQ ID NO: 1442, SEQ ID NO: 1444, SEQ ID NO: 1445, SEQ 
IDNO: 1446, SEQ ID NO: 1447, SEQ ID NO: 1448, SEQ ID NO: 
1449, SEQ ID NO: 1451, SEQ ID NO: 1452, SEQ ID NO: 1453, 
SEQ ID NO: 1454, SEQ ID NO: 1455, SEQ ID NO: 1456, SEQ ID 

1 785 NO : 1458, SEQ ID NO : 1459, SEQ ID NO : 1461, SEQ ID NO : 1462, 
SEQ ID NO: 1463, SEQ ID NO: 1464, SEQ ID NO: 1465, SEQID 
NO: 1466, SEQ ID NO: 1468, SEQ ID NO: 1469, SEQ. ID NO: 1470, 
SEQ ID NO: 1472, SEQ ID NO: 1474, SEQ ID NO: 1475, SEQ ID 
NO: 1476, SEQ. ID NO: 1477, SEQ. ID NO: 1479, SEQ ID NO: 1480, 

1790 SEQ ID NO: 1481, SEQ ID NO: 1482, SEQ ID NO: 1483, SEQ ID 
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NO: 1484, SEQ ID NO : 1485, SEQ ID NO: 1486, SEQID NO: 1488, 
SEQ ID NO: 1490, SEQ ID NO : 1491, SEQ ID NO: 1492, SEQ 
IDNO: 1493, SEQ ID NO: 1495, SEQ ID NO: 1496, SEQ ID NO: 
1497, SEQ ID NO: 1498, SEQ. ID NO: 1500, SEQ ID NO: 1502, 

1 795 SEQ ID NO: 1503, SEQ ID NO: 1504, SEQ ID NO: 1505, SEQ 
ID NO: 1507, SEQ ID NO: 1509, SEQ ID NO: 1512, SEQ ID NO: 
1513, SEQ ID NO: 1514, SEQ ID NO: 1515, SEQ. ID NO: 1517, 
SEQ IDNO: 1518, SEQ ID NO: 1519, SEQ ID NO: 1521, SEQ ID 
NO: 1522, SEQ ID NO: 1523, SEQ ID NO: 1524, SEQ ID NO: 1525, 

1800 SEQ ID NO: 1527, SEQ ID NO: 1528, SEQ ID NO: 1529, SEQ ID 
NO: 1530, SEQ. ID NO : 1531, SEQ, ID NO: 1533, SEQ ID NO: 1534, 
SEQ ID NO: 1535, SEQ ID NO: 1536, SEQ ID NO: 1538, SEQID 
NO: 1539, SEQ ID NO : 154 1, SEQ ID NO: 1542, SEQ ID NO: 1543, 
SEQ ID NO: 1544, SEQ ID NO: 1546, SEQ ID NO: 1548, SEQ ID 

1 805 NO: 1550, SEQ ID NO: 1552, SEQ ID NO: 1554, SEQ. ID NO: 1556, 
SEQ ID NO: 1557. SEQ ID NO: 1559, SEQ ID NO: 1560, SEQ ID 
NO: 1561, SEQ ID NO : 1562, SEQ ID NO: 1564, SEQID NO: 1565, 
SEQ ID NO: 1567, SEQ ID NO: 1568, SEQ ID NO: 1570, SEQ 
IDNO : 1572, SEQ ID NO: 1573, SEQ ID NO: 1574, SEQ ID NO: 

1810 1575, SEQ ID NO: 1577, SEQ. ID NO: 1578, SEQ ID NO: 1579, 
SEQ ID NO: 1581, SEQ ID NO: 1582, SEQ ID NO : 1583, SEQ ID 
NO: 1585, SEQ ID NO: 1586, SEQ ID NO: 1588, SEQ ID NO: 1589, 
SEQ ID NO : 1590, SEQ. ID NO: 1592, SEQ. ID NO: 1593, SEQ. 
IDNO: 1595, SEQ ID NO: 1597, SEQ ID NO: 1598, SEQ. ID NO: 

1815 1600, SEQ ID NO: 1602, SEQ ID NO: 1606, SEQ ID NO: 1608, 
SEQ ID NO: 1609, SEQ ID NO: 1610, SEQ ID NO: 1611, SEQ ID 
NO: 1613, SEQ. ID NO: 1614, SEQ ID NO: 1616, SEQ ID NO: 1618, 
SEQ ID NO: 1620, SEQ ID NO : 1621, SEQ ID NO: 1623, SEQID 
NO: 1625, SEQ ID NO: 1628, SEQ ID NO: 1630, SEQ. ID NO : 1631, 

1 820 SEQ ID NO: 1633, SEQ ID NO: 1634, SEQ ID NO: 1638, SEQ ID 
NO: 1641, SEQ ID NO : 1642, SEQ ID NO: 1644, SEQ. ID NO: 1645, 
SEQ ID NO: 1647, SEQ ID NO: 1648, SEQ ID NO: 1650, SEQ ID 
NO: 1651, SEQ. ID NO: 1653, SEQ. ID NO: 1654, SEQID NO: 1656, 
SEQ ID NO: 1657, SEQ ID NO: 1659, SEQ ID NO: 1661, SEQ 
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IDNO : 1663, SEQ ID NO: 1665, SEQ ID NO : 1667, SEQ ID NO: 
1671, SEQ ID NO: 1674, SEQ ID NO: 1676, SEQ ID NO : 1678, 
SEQ ID NO: 1679, SEQ ID NO: 1681, SEQ ID NO: 1684, SEQ ID 
NO: 1686, SEQ ID NO: 1687, SEQ. ID NO: 1689, SEQ ID NO: 1692, 
SEQ ID NO: 1693, SEQ ID NO: 1695, SEQ ID NO: 1697, SEQ 
IDNO: 1698, SEQ ID NO: 1702, and SEQ ID NO: 1703 

, or (b) a polypeptide comprising an amino acid sequence 
in the nucleotide sequences set forth in (a) in which several 
amino acids are deleted, replaced or added, 
[0013] 

In another embodiment, the present invention relates to a 
polypeptide specific to enterohemorrhagic pathogenic- E. coli 
0-157:H7. 

In a preferred embodiment, the present invention relates to a 
polypeptide specific to 0- 157:H7 comprising 

(a) an amino acid sequence selected from a group 
comprising the following SEQ IDs or a fragment thereof: SEQ 
ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID 
NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 



10, 


SEQ ID 


NO 


: 11 


SEQ ID NO : 1 


L SEQ. ID NO: 


13, 


SEQ ID 


NO 


14, 


SEQ 


ID 


NO 


15, 


SEQ IDNO: 


16, 


SEQ ID NO 


17, 


SEQ ID 


NO 


18, 


SEQ 


ID 


NO 


19, 


SEQ ID NO 


:20 


SEQID NO 


21, 


SEQ ID 


NO 


22, 


SEQ 


ID 


NO 


23, 


SEQ ID NO 


:24 


SEQ ID NO: 25 


SEQ. ID 


NO 


26, 


SEQ. 


ID 


NO 


27, 


SEQ ID NO 


28, 


SEQ ID NO 


29, 


SEQ. ID 


NO 


30, 


SEQ 


ID 


NO 


31, 


SEQ ID NO 


32, 


SEQ ID NO 


33, 


SEQ ID 


NO 


34, 


SEQ 


ID 


NO 


35, 


SEQ. ID NO 


36, 


SEQ. ID NO 


37, 


SEQ ID 


NO 


38, 


SEQ 


ID 


NO 


39, 


SEQ IDNO; 


40, 


SEQ ID NO 


41, 


SEQ ID 


NO 


42, 


SEQ 


ID 


NO 


43, 


SEQ ID NO 


:44 


SEQID NO 


45, 


SEQ ID 


NO 


46, 


SEQ 


ID 


NO 


47, 


SEQ ID NO 


:48 


SEQ ID NO 


: 49 


SEQ ID 


NO 


50, 


SEQ. 


ID 


NO 


51, 


SEQ ID NO 


52, 


SEQ ID NO 


53, 


SEQ. ID 


NO 


54, 


SEQ. 


ID 


NO 


55, 


SEQ ID NO 


56, 


SEQ ID NO 


57, 


SEQ. ID 


NO 


58, 


SEQ 


ID 


NO 


59, 


SEQ ID NO 


60, 


SEQ. ID NO 


61, 


SEQ ID 


NO 


62, 


SEQ 


ID 


NO 


63, 


SEQ. IDNO: 


64, 


SEQ ID NO 


65, 


SEQ ID 


NO 


66, 


SEQ 


ID 


NO 


67, 


SEQ ID NO 


: 68 


SEQID NO 


69, 


SEQ ID 



Appendix B: Hideo et at. Full Translation 

NO: 70, SEQ ID NO: 71, SEQ ID NO: 72, SEQ ID NO: 73,SEQ ID 

1860 NO: 74, SEQ ID NO: 75, SEQ ID NO : 76, SEQ ID NO: 77, SEQ ID 
NO: 78, SEQ ID NO: 79, SEQ ID NO : 80, SEQ ID NO: 81, SEQ ID 
NO: 82, SEQ ID NO: 83, SEQ. ID NO : 84, SEQ. ID NO : 85, SEQ ID 
NO: 86, SEQ ID NO : 87, SEQ IDNO:88, SEQ ID NO: 89, SEQ ID 
NO: 90, SEQ ID NO: 91, SEQ ID NO: 92, SEQID NO: 93, SEQ ID 

1 865 NO: 94, SEQ ID NO : 95, SEQ ID NO : 96, SEQ ID NO: 97, SEQ ID 
NO: 98, SEQ ID NO : 99, SEQ ID NO: 100, SEQ ID NO: 101, SEQ 
ID NO: 102, SEQ ID NO : 103, SEQ ID NO: 104, SEQ ID NO: 105, 
SEQ ID NO: 106, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 
109, SEQ. ID NO: 110, SEQ. ID NO: 111, SEQ ID NO: 112, SEQ ID 

1870 NO: 113, SEQ ID NO: 114, SEQ ID NO: 115, SEQ ID NO: 116, 
SEQ ID NO: 117, SEQ ID NO: 118, SEQ ID NO: 119, SEQ ID NO: 
120, SEQ ID NO: 121, SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID 
NO: 124, SEQ ID NO: 125, SEQ ID NO: 126, SEQ ID NO: 127, 
SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130, SEQ ID NO: 

1875 131, SEQ ID NO: 133, SEQ ID NO: 134, SEQ ID NO: 135, SEQ ID 
NO: 136, SEQ ID N 0:137, SEQ ID NO: 138, SEQ ID NO: 139, 
SEQ ID NO : 140. SEQ ID NO: 141, SEQ ID NO: 142, SEQ ID NO: 
143, SEQ ID NO: 144, SEQ. ID NO: 145, SEQ ID NO: 146, SEQ ID 
NO: 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, 

1 880 SEQ ID NO: 151, SEQ ID NO: 152, SEQ ID NO: 153, SEQ ID NO: 
154, SEQ ID NO : 155, SEQ ID NO: 156, SEQ ID NO: 157, SEQ ID 
NO: 158, SEQ ID NO: 159, SEQ ID NO: 160, SEQ ID NO: 161, 
SEQ ID NO: 162, SEQ ID NO : 163, SEQ ID NO: 164, SEQ ID NO: 
165, SEQ. ID NO: 166, SEQ ID NO: 167, SEQ ID NO: 168, SEQ ID 

1885 NO: 169, SEQ ID NO: 170, SEQ ID NO: 171, SEQ ID NO: 172, 
SEQ ID NO: 173, SEQ ID NO: 174, SEQ ID NO: 175, SEQ ID NO: 
176, SEQ ID NO: 177, SEQ ID NO: 178, SEQ ID NO: 179, SEQ ID 
NO: 180, SEQ ID NO: 181, SEQ ID NO: 182, SEQ ID NO: 183, 
SEQ ID NO: 184, SEQ. ID NO: 185, SEQ ID NO: 186, SEQ ID NO: 

1890 .187, SEQ ID NO: 188, SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID 
NO: 191, SEQ ID NO: 192, SEQ. ID NO: 193, SEQ ID NO: 194, 
SEQ ID NO: 195, SEQ ID NO: 196, SEQ ID NO: 197, SEQ ID NO: 
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198, SEQ ID NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQ ID 
NO: 202, SEQ ID NO : 203, SEQ ID NO : 204, SEQ ID NO : 205, 

1895 SEQ ID NO: 206, SEQ ID NO: 207, SEQ ID NO: 208, SEQ ID NO: 
209, SEQ. ID NO: 210, SEQ. ID NO: 211, SEQ ID NO: 212, SEQ ID 
NO: 213, SEQ ID NO: 214, SEQ ID NO: 215, SEQ ID NO: 216, 
SEQ ID NO: 217, SEQ ID NO : 218, SEQ ID NO: 219, SEQ ID NO: 
220, SEQ ID NO:221, SEQ ID NO:222, SEQ ID NO: 223. SEQ ID 

1900 NO: 224, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID NO: 227, 
SEQ ID NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 
231, SEQ ID NO: 232, SEQ ID NO : 233, SEQ ID NO: 234, SEQ 
ID NO: 235, SEQ ID NO: 236, SEQ. ID NO: 237, SEQ ID NO: 238, 
SEQ ID NO:239, SEQ ID NO:240, SEQ ID NO:241, SEQ ID NO: 

1905 242, SEQ ID NO:243, SEQ ID NO:245, SEQ ID NO:246, SEQ ID 
NO: 247, SEQ ID NO: 248, SEQ ID NO: 249, SEQ ID NO: 250, 
SEQ ID NO: 251, SEQ, ID NO: 252, SEQ, ID NO: 253, SEQ ID NO: 
254, SEQ ID NO: 255, SEQ ID NO:256, SEQ ID NO: 257, SEQ ID 
NO: 258, SEQ ID NO: 259, SEQ ID NO: 260, SEQ ID NO : 261, 

1910 SEQ ID NO: 262, SEQ ID NO: 263, SEQ ID NO:264, SEQ ID NO: 
265, SEQ, ID NO: 266, SEQ. ID NO: 267, SEQ ID NO: 268, SEQ ID 
NO: 269, SEQ ID NO: 270, SEQ. ID NO: 271, SEQ ID NO: 272, 
SEQ ID NO: 273, SEQ ID NO : 274, SEQ ID NO: 275, SEQ ID NO: 
276, SEQ ID NO: 277, SEQ ID NO: 278, SEQ ID NO: 279, SEQ ID 

1915 NO: 280, SEQ ID NO: 281, SEQ ID NO: 282, SEQ ID NO: 283, 
SEQ ID NO: 284, SEQ. ID NO: 285, SEQ. ID NO: 286, SEQ ID NO: 
287, SEQ ID NO: 288, SEQ ID NO: 289, SEQ ID NO: 290, SEQ ID 
NO: 291, SEQ ID NO: 292, SEQ ID NO: 293, SEQ ID NO: 294, 
SEQ ID NO: 295, SEQ ID NO: 296, SEQ ID NO: 297, SEQ ID NO: 

1920 298, SEQ ID NO: 299, SEQ ID NO: 300, SEQ ID NO: 301, SEQ ID 
NO: 302, SEQ ID NO: 303, SEQ ID NO : 304, SEQ ID NO : 305, 
SEQ ID NO: 306, SEQ. ID NO: 307, SEQ. ID NO : 308, SEQ ID NO: 
309, SEQ ID NO: 310, SEQ ID NO: 311, SEQ. ID NO: 312, SEQ ID 
NO: 313, SEQ ID NO: 314, SEQ. ID NO: 315, SEQ ID NO : 316, 

1925 SEQ ID NO: 317, SEQ ID NO: 318, SEQ ID NO: 319, SEQ ID NO: 
320, SEQ. ID NO: 321, SEQ ID NO: 322, SEQ ID NO: 323, SEQ ID 
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NO: 324, SEQ ID NO: 325, SEQ ID NO : 326, SEQ ID NO: 327, 
SEQ ID NO: 32 8, SEQ ID NO: 32 9, SEQ ID NO: 330, SEQ ID NO: 
331, SEQ ID NO: 332, SEQ ID NO: 333, SEQ ID NO: 334, SEQ ID 

1930 NO: 335, SEQ ID NO: 336, SEQ ID NO : 338, SEQ ID NO : 339, 
SEQ ID NO: 340, SEQ ID NO : 341, SEQ ID NO: 342, SEQ ID NO: 
343, SEQ ID NO: 344, SEQ ID NO: 345, SEQ ID NO: 346, SEQ ID 
NO: 347, SEQ ID NO: 348, SEQ ID NO: 349, SEQ ID NO : 350, 
SEQ ID NO: 351, SEQ ID NO: 352, SEQ ID NO: 353, SEQ ID NO: 

1935 354, SEQ ID NO: 355, SEQ ID NO: 356, SEQ ID NO: 357, SEQ ID 
NO: 358, SEQ ID NO: 359, SEQ ID NO: 360, SEQ ID NO: 361, 
SEQ ID NO: 362, SEQ ID NO: 363, SEQ ID NO : 364, SEQ ID NO: 
365, SEQ ID NO : 366, SEQ ID NO : 367, SEQ ID NO: 368, SEQ ID 
NO: 369, SEQ ID NO: 370, SEQ ID NO: 371, SEQ ID NO: 372, 

1940 SEQ ID NO: 373, SEQ ID NO: 374, SEQ ID NO: 375, SEQ ID NO: 
376, SEQ ID NO : 377, SEQ ID NO: 378, SEQ ID NO: 379, SEQ ID 
NO: 380, SEQ ID NO: 381, SEQ ID NO: 382, SEQ ID NO: 383, 
SEQ ID NO: 384, SEQ ID NO : 385, SEQ ID NO: 386, SEQ ID NO: 
387, SEQ ID NO: 388, SEQ ID NO: 389, SEQ ID NO: 390, SEQ ID 

1945 NO: 391, SEQ ID NO: 392, SEQ ID NO: 393, SEQ ID NO : 394, 
SEQ ID NO: 395, SEQ ID NO: 396, SEQ ID NO: 397, SEQ ID NO: 
398, SEQ ID NO: 399, SEQ ID NO: 400, SEQ ID NO: 401, SEQ ID 
NO: 402, SEQ ID NO: 403, SEQ ID NO: 404, SEQ ID NO : 405, 
SEQ ID NO: 406, SEQ ID NO: 407, SEQ ID NO: 408, SEQ ID NO: 

1950 409, SEQ ID NO: 411, SEQ ID NO: 412, SEQ ID NO: 413, SEQ ID 
NO: 414, SEQ ID NO: 415, SEQ ID NO: 416, SEQ ID NO: 417, 
SEQ ID NO: 418, SEQ ID NO : 419, SEQ ID NO: 420, SEQ ID NO: 
421, SEQ ID NO: 422, SEQ ID NO: 423, SEQ ID NO: 424, SEQ ID 
NO: 425, SEQ ID NO: 426, SEQ ID NO: 427, SEQ ID NO: 428, 

1 955 SEQ ID NO: 429, SEQ ID NO: 430, SEQ ID NO: 431, SEQ ID NO: 
432, SEQ ID NO: 433, SEQ ID NO: 434, SEQ ID NO: 435, SEQ ID 
NO: 436, SEQ ID NO: 437, SEQ ID NO : 438, SEQ ID NO: 439, 
SEQ ID NO: 440, SEQ ID NO: 441, SEQ ID NO: 442, SEQ ID NO: 
443, SEQ. ID NO: 444, SEQ. ID NO: 445, SEQ ID NO: 446, SEQ ID 

I960 NO: 447, SEQ ID NO: 448, SEQ. ID NO: 449, SEQ ID NO : 450, 
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SEQ ID NO: 451, SEQ ID NO : 452, SEQ ID NO: 453, SEQ ID NO: 
454, SEQ ID NO: 455, SEQ ID NO: 456, SEQ ID NO: 457, SEQ ID 
NO: 458, SEQ ID NO: 459, SEQ ID NO: 460, SEQ ID NO: 461, 
SEQ ID NO : 462. SEQ ID NO: 463, SEQ ID NO: 464, SEQ ID NO: 

1965 465, SEQ ID NO : 466, SEQ ID NO: 467, SEQ ID NO: 468, SEQ ID 
NO: 469, SEQ ID NO: 470, SEQ ID NO: 471, SEQ ID NO: 472, 
SEQ ID NO: 473, SEQ ID NO: 474, SEQ ID NO: 475, SEQ ID NO: 
476, SEQ ID NO: 477, SEQ ID NO: 478, SEQ ID NO: 479, SEQ ID 
NO: 480, SEQ ID NO : 481, SEQ ID NO: 482, SEQ ID NO : 483, 

197 0 S E Q 1 1) N 0 : 4 85, S E Q I D N 0 : 4 86, SEQ I D N 0 : 4 8 7 , S E Q I D N 0 : 
488, SEQ ID NO: 489, SEQ. ID NO: 490, SEQ ID NO: 491, SEQ ID 
NO: 492, SEQ ID NO: 493, SEQ ID NO : 494, SEQ ID NO : 495, 
SEQ ID NO: 496, SEQ ID NO: 497, SEQ ID NO: 498, SEQ ID NO: 
499, SEQ ID NO: 500, SEQ ID NO : 501, SEQ ID NO: 502, SEQ ID 

1975 NO: 503, SEQ ID NO : 504, SEQ ID NO: 505, SEQ ID NO : 506. 
SEQ ID NO: 507, SEQ ID NO : 508, SEQ ID NO: 509, SEQ ID NO: 
510, SEQ ID NO: 511, SEQ ID NO:512, SEQ ID NO: 513, SEQ ID 
NO: 514, SEQ ID NO: 515, SEQ ID NO : 516, SEQ ID NO: 517, 
SEQ ID NO: 518, SEQ ID NO : 519, SEQ ID NO: 520, SEQ ID NO: 

1980 521, SEQ ID NO : 52 2, SEQ ID NO: 523, SEQ ID NO: 524, SEQ ID 
NO: 525, SEQ ID NO: 526, SEQ ID NO: 527, SEQ ID NO: 528, 
SEQ ID NO: 529, SEQ ID NO: 530, SEQ ID NO: 531, SEQ ID NO: 
532, SEQ ID NO: 533, SEQ ID NO: 534, SEQ ID NO: 535, SEQ ID 
NO: 536, SEQ ID NO : 537, SEQ ID NO : 538, SEQ ID NO: 539, 

1985 S E Q I D NO: 5 4 0 , S E Q I D N 0 : 5 4 1 , S E Q I D N 0 : 5 4 2 , S E Q I D N 0 : 
543, SEQ ID NO : 544, SEQ ID NO: 545, SEQ ID NO: 546, SEQ ID 
NO: 547, SEQ ID NO: 548, SEQ ID NO: 549, SEQ ID NO : 550, 
SEQ ID NO: 551, SEQ ID NO: 552, SEQ ID NO:553, SEQ ID NO: 
555, SEQ ID NO: 556, SEQ ID NO : 557, SEQ ID NO: 558, SEQ ID 

1990 NO: 559, SEQ ID NO: 560, SEQ ID NO: 561, SEQ ID NO: 562, 
SEQ ID NO: 563, SEQ ID NO: 564, SEQ ID NO: 565, SEQ ID NO: 
566, SEQ ID NO: 567, SEQ ID NO: 568, SEQ ID NO: 569, SEQ ID 
NO: 570, SEQ ID NO: 571, SEQ. ID NO: 572, SEQ ID NO: 573, 
SEQ ID NO: 574, SEQ ID NO: 575, SEQ ID NO: 576, SEQ ID NO: 
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1995 577, SEQ ID NO: 578, SEQ ID NO: 579, SEQ ID NO: 580, SEQ ID 
NO: 581, SEQ ID NO: 582, SEQ ID NO: 583, SEQ ID NO : 584, 
SEQ ID NO: 585, SEQ ID NO: 586, SEQ ID NO: 587, SEQ ID NO: 
588, SEQ. ID NO: 589, SEQ. ID NO: 590, SEQ ID NO: 591, SEQ ID 
NO: 592, SEQ ID NO: 593, SEQ ID NO: 594, SEQ ID NO : 595, 

2000 SEQ ID NO: 596, SEQ ID NO: 597, SEQ ID NO : 598, SEQ ID NO: 
599, SEQ ID NO: 600, SEQ ID NO: 601, SEQ ID NO: 602, SEQ ID 
NO: 603, SEQ ID NO : 604, SEQ ID NO: 605, SEQ ID NO : 606, 
SEQ ID NO: 607, SEQ ID NO: 608, SEQ ID NO: 609, SEQ ID NO: 
610, SEQ ID NO: 611, SEQ ID NO: 612, SEQ ID NO: 613, SEQ ID 

2005 NO: 614, SEQ ID NO: 615, SEQ. ID NO : 616, SEQ ID NO: 617, 
SEQ ID NO: 618, SEQ ID NO: 619, SEQ ID NO: 620, SEQ ID NO: 
621, SEQ ID NO: 622, SEQ ID NO: 623, SEQ ID NO: 624, SEQ ID 
NO: 625, SEQ ID NO : 626, SEQ ID NO: 627, SEQ ID NO: 628, 
SEQ ID NO: 629, SEQ ID NO : 631, SEQ ID NO: 632, SEQ ID NO: 

2010 633, SEQ ID NO: 634, SEQ ID NO: 635, SEQ ID NO: 636, SEQ ID 
NO: 637, SEQ ID NO: 638, SEQ ID NO: 639, SEQ ID NO: 640, 
SEQ ID NO: 641, SEQ ID NO: 642, SEQ ID NO: 643, SEQ ID NO: 
644, SEQ, ID NO: 645, SEQ. ID NO: 646, SEQ ID NO: 647, SEQ ID 
NO: 648, SEQ ID NO : 649, SEQ ID NO: 650, SEQ ID NO: 651, 

2015 SEQ ID NO: 652, SEQ ID NO: 653, SEQ ID NO : 654, SEQ ID NO: 
655, SEQ ID NO: 656, SEQ ID NO: 657, SEQ ID NO: 658, SEQ ID 
NO: 659, SEQ ID NO : 660, SEQ ID NO: 661, SEQ ID NO: 662, 
SEQ ID NO: 663, SEQ. ID NO : 664, SEQ ID NO: 665, SEQ ID NO: 
666, SEQ ID NO: 667, SEQ ID NO: 668, SEQ ID NO: 669, SEQ ID 

2020 NO: 670, SEQ ID NO: 671, SEQ. ID NO: 672, SEQ ID NO : 673, 
SEQ ID NO: 674, SEQ ID NO: 675, SEQ ID NO : 676, SEQ ID NO: 
677, SEQ ID NO: 678, SEQ ID NO: 679, SEQ ID NO: 680, SEQ ID 
NO: 681, SEQ ID NO: 682, SEQ ID NO: 683, SEQ ID NO: 684, 
SEQ ID NO: 685, SEQ. ID NO: 686, SEQ. ID NO: 687, SEQ ID NO: 

2025 688, SEQ ID NO : 690, SEQ ID NO: 691, SEQ ID NO: 692, SEQ ID 
NO: 693, SEQ ID NO: 694, SEQ ID NO: 695, SEQ ID NO: 696, 
SEQ ID NO: 697, SEQ ID NO: 698, SEQ ID NO: 699, SEQ ID NO: 
700, SEQ. ID NO: 701, SEQ. ID NO: 702, SEQ ID NO: 703, SEQ ID 
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NO: 704, SEQ ID NO : 705, SEQ ID NO : 706, SEQ ID NO: 707, 

203 0 SEQ ID NO: 708, SEQ ID NO: 709, SEQ ID NO: 710, SEQ ID NO: 
711, SEQ ID NO: 712, SEQ ID NO: 713, SEQ ID NO: 714, SEQ ID 
NO: 715, SEQ ID NO : 716, SEQ. ID NO: 717, SEQ ID NO: 718, 
SEQ ID NO: 719, SEQ ID NO: 720, SEQ ID NO: 721, SEQ ID NO: 
722, SEQ ID NO: 723, SEQ ID NO: 724, SEQ ID NO: 725, SEQ ID 

2035 NO: 726, SEQ ID NO: 727, SEQ ID NO: 728, SEQ ID NO: 729, 
SEQ ID NO: 730, SEQ ID NO: 731, SEQ ID NO: 732, SEQ ID NO: 
733, SEQ ID NO: 734, SEQ ID NO: 735, SEQ ID NO: 736, SEQ ID 
NO: 737, SEQ ID NO: 738, SEQ ID NO: 739, SEQ ID NO: 740, 
SEQ ID NO: 741, SEQ ID NO: 742, SEQ ID NO: 743, SEQ ID NO: 

2040 744, SEQ ID NO: 745, SEQ ID NO: 746, SEQ ID NO: 747, SEQ ID 
NO: 748, SEQ ID NO: 749, SEQ ID NO: 750, SEQ ID NO: 751, 
SEQ ID NO: 752, SEQ ID NO: 753, SEQ ID NO: 754, SEQ ID NO: 
756, SEQ ID NO : 75 7. SEQ ID NO: 758, SEQ ID NO: 759, SEQ ID 
NO: 760, SEQ ID NO: 761, SEQ ID NO: 762, SEQ ID NO: 763, SE 

2045 Q ID NO: 764, SEQ ID NO: 765, SEQ ID NO: 766, SEQ ID NO: 
767, SEQ ID NO: 768, SEQ ID NO: 769, SEQ ID NO: 770, SEQ ID 
NO: 771, SEQ ID NO: 772, SEQ ID NO: 773, SEQ ID NO: 774, 
SEQ ID NO: 775, SEQ ID NO: 776, SEQ ID NO: 777, SEQ ID NO: 
778, SEQ ID NO: 779, SEQ ID NO: 780, SEQ ID NO: 781, SEQ ID 

2050 NO: 782, SEQ ID NO: 783, SEQ ID NO: 784, SEQ ID NO : 785, 
SEQ ID NO: 786, SEQ ID NO: 787, SEQ ID NO : 788, SEQ ID NO: 
789, SEQ ID NO: 790, SEQ ID NO: 791, SEQ ID NO: 792, SEQ ID 
NO: 793, SEQ ID NO: 794, SEQ ID NO: 795, SEQ ID NO: 796, 
SEQ ID NO: 797, SEQ ID NO: 798, SEQ ID NO: 799, SEQ ID NO: 

2055 800, SEQ ID NO: 801, SEQ ID NO: 802, SEQ ID NO: 803, SEQ ID 
NO: 804, SEQ ID NO: 805, SEQ ID NO : 806, SEQ ID NO: 807, 
SEQ ID NO: 808, SEQ ID NO: 809, SEQ ID NO: 810, SEQ ID NO: 
811, SEQ ID NO: 812, SEQ ID NO: 813, SEQ ID NO: 814, SEQ ID 
NO: 815, SEQ ID NO: 817, SEQ ID NO: 818, SEQ ID NO: 819, 

2060 SEQ ID NO: 820, SEQ ID NO: 821, SEQ ID NO: 822, SEQ ID NO: 
823, SEQ. ID NO: 824, SEQ. ID NO: 825, SEQ ID NO: 826, SEQ ID 
NO: 827, SEQ ID NO : 828, SEQ. ID NO: 829, SEQ ID NO : 830, 
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SEQ ID NO: 831, SEQ ID NO : 832, SEQ ID NO: 833, SEQ ID NO: 
834, SEQ ID NO: 835, SEQ ID NO: 836, SEQ ID NO: 837, SEQ ID 

2065 NO: 838, SEQ ID NO : 839, SEQ ID NO: 840, SEQ ID NO : 841, 
SEQ ID NO : 842. SEQ ID NO: 843, SEQ ID NO : 844, SEQ ID NO: 
845, SEQ ID NO: 846, SEQ ID NO: 847, SEQ ID NO: 848, SEQ ID 
NO: 849, SEQ ID NO: 850, SEQ ID NO: 851, SEQ ID NO: 852, 
SEQ ID NO: 853, SEQ ID NO: 854, SEQ ID NO: 855, SEQ ID NO: 

2070 856, SEQ ID NO: 857, SEQ ID NO: 858, SEQ ID NO: 859, SEQ ID 
NO: 860, SEQ ID NO: 861, SEQ ID NO: 862, SEQ ID NO: 863, 
SEQ ID NO: 864, SEQ ID NO: 865, SEQ ID NO: 866, SEQ ID NO: 
867, SEQ ID NO: 868, SEQ. ID NO: 869, SEQ ID NO: 870, SEQ ID 
NO: 871, SEQ ID NO: 872, SEQ ID NO: 873, SEQ ID NO : 874, 

2075 SEQ ID NO: 875, SEQ ID NO: 877, SEQ ID NO: 878, SEQ ID NO: 
879, SEQ ID NO: 880, SEQ ID NO : 881, SEQ ID NO: 882, SEQ ID 
NO: 883, SEQ ID NO : 884, SEQ ID NO: 885, SEQ ID NO : 886, 
SEQ ID NO: 887, SEQ ID NO: 888, SEQ ID NO: 889, SEQ ID NO: 
890, SEQ ID NO: 891, SEQ ID NO:892, SEQ ID NO:893, SEQ ID 

2080 NO: 894, SEQ ID NO: 895, SEQ ID NO : 896, SEQ ID NO: 897, 
SEQ ID NO: 898, SEQ ID NO: 899, SEQ ID NO: 900, SEQ ID NO: 
901, SEQ ID NO: 902, SEQ ID NO: 903, SEQ ID NO: 904, SEQ ID 
NO: 905, SEQ ID NO : 906, SEQ ID NO: 907, SEQ ID NO: 908, 
SEQ ID NO: 909, SEQ ID NO: 910, SEQ ID NO: 911, SEQ ID NO: 

2085 912, SEQ ID NO: 913, SEQ ID NO: 914, SEQ ID NO: 915, SEQ ID 
NO: 916, SEQ ID NO: 917, SEQ ID NO: 918, SEQ ID NO: 919, 
SEQ ID NO: 920, SEQ ID NO : 921, SEQ ID NO: 922, SEQ ID NO: 
923, SEQ. ID NO: 924, SEQ. ID NO: 925, SEQ ID NO: 926, SEQ ID 
NO: 928, SEQ ID NO : 929, SEQ ID NO: 930, SEQ ID NO: 931, 

2090 SEQ ID NO: 932, SEQ ID NO: 933, SEQ ID NO : 934, SEQ ID NO: 
935, SEQ ID NO: 936, SEQ ID NO: 937, SEQ ID NO: 938, SEQ ID 
NO: 939, SEQ ID NO: 940, SEQ ID NO: 941, SEQ ID NO: 942, 
SEQ ID NO: 943, SEQ. ID NO : 944, SEQ ID NO: 945, SEQ ID NO: 
946, SEQ. ID NO: 947, SEQ. ID NO: 948, SEQ. ID NO: 949, SEQ ID 

2095 NO: 950, SEQ ID NO: 951, SEQ. ID NO: 952, SEQ ID NO : 953, 
SEQ ID NO: 954, SEQ ID NO: 955, SEQ ID NO : 956, SEQ ID NO: 
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957, SEQ ID NO: 958, SEQ ID NO: 959, SEQ ID NO: 960, SEQ ID 
NO: 961, SEQ ID NO : 962, SEQ ID NO: 963, SEQ ID NO: 964, 
SEQ ID NO: 965, SEQ ID NO: 966, SEQ ID NO: 967, SEQ ID NO: 

2100 968, SEQ. ID NO : 969, SEQ. ID NO: 970, SEQ ID NO: 971, SEQ ID 
NO: 972, SEQ ID NO: 973, SEQ ID NO: 974, SEQ ID NO : 975, 
SEQ ID N 0:976, SEQ ID NO: 977, SEQ ID NO: 979, SEQ ID NO: 
980, SEQ ID NO: 981, SEQ ID NO: 982, SEQ ID NO: 983, SEQ ID 
NO: 984, SEQ ID NO : 985, SEQ ID NO : 986, SEQ ID NO: 987, 

2105 SEQ ID NO: 988, SEQ ID NO: 989, SEQ ID NO: 990, SEQ ID NO: 
991, SEQ ID NO: 992, SEQ ID NO: 993, SEQ ID NO: 994, SEQ ID 
NO: 995, SEQ ID NO: 996, SEQ. ID NO: 997, SEQ ID NO : 998, 
SEQ ID NO: 999, SEQ ID NO: 1000, SEQ ID NO: 1001, SEQ ID 
NO: 1002, SEQ ID NO: 1003, SEQ ID NO: 1004, SEQ ID NO: 1005, 

2110 SEQID NO: 1006, SEQ ID NO: 1007, SEQ ID NO: 1008, SEQ ID 
NO: 1009, SEQ IDNO: 1010, SEQ ID NO: 1011, SEQ ID NO: 1012, 
SEQ ID NO: 1014, SEQ ID NO: 1015, SEQ ID NO: 1016, SEQ ID 
NO: 1017, SEQ ID NO: 1018, SEQ ID NO: 1019, SEQ ID NO: 1020, 
SEQ ID NO: 1021, SEQ ID NO: 1022, SEQ ID NO: 1023, SEQ ID 

2115 NO: 1024, SEQ. ID NO: 1025, SEQ. ID NO: 1026, SEQ ID NO: 1027, 
SEQ IDNO: 1028, SEQ ID NO: 1030, SEQ. ID NO: 1031, SEQ. ID 
NO: 1032, SEQ. ID NO : 1033, SEQ ID NO: 1034, SEQ. ID NO: 1035, 
SEQ ID NO: 1036, SEQ ID NO: 1037, SEQ ID NO: 1038, SEQ ID 
NO: 1039, SEQ ID NO: 1040, SEQ ID NO: 1041, SEQ. ID NO: 1042, 

2120 SEQ ID NO: 1043, SEQ ID NO: 1044, SEQ ID NO: 1045, SEQID 
NO: 1046, SEQ ID NO: 1047, SEQ ID NO: 1048, SEQ ID NO: 1049, 
SEQ ID NO: 1050, SEQ. ID NO : 1051, SEQ. ID NO: 1052, SEQ ID 
NO: 1053, SEQ. ID NO: 1054, SEQ ID NO: 1056, SEQ ID NO: 1057, 
SEQ ID NO: 1058, SEQ ID NO: 1059, SEQ ID NO: 1061, SEQ ID 

2125 NO : 1062, SEQ ID NO : 1063, SEQ ID NO : 1064, SEQID NO : 1065, 
SEQ ID NO: 1066, SEQ. ID NO: 1067, SEQ. ID NO : 1068, SEQ. 
IDNO: 1069, SEQ ID NO: 1070, SEQ ID NO: 1071, SEQ. ID NO: 
1072, SEQ ID NO: 1073, SEQ. ID NO: 1074, SEQ ID NO: 1075, 
SEQ ID NO: 1076, SEQ ID NO: 1077, SEQ. ID NO: 1078, SEQ ID 

2130 NO: 1079, SEQ. ID NO: 1080, SEQ ID NO: 1081, SEQ ID NO: 1082, 
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SEQ ID NO: 1083, SEQ ID NO: 1084, SEQ ID NO: 1085, SEQ 
IDNO: 1086, SEQ ID NO: 1087, SEQ ID NO: 1088, SEQ ID NO: 
1089, SEQ ID NO: 1090, SEQ ID NO: 1091, SEQ ID NO: 1092, 
SEQ ID NO: 1094, SEQ ID NO: 1095, SEQ. ID NO: 1096, SEQ ID 

2135 NO: 1097, SEQ ID NO: 1098, SEQ ID NO: 1099, SEQ ID NO: 1100, 
SEQ ID NO: 1101, SEQ ID NO: 1102, SEQ ID NO: 1103, SEQID 
NO: 1104, SEQ ID NO: 1105, SEQ ID NO: 1106, SEQ ID NO: 1107, 
SEQ ID NO: 1108, SEQ ID NO: 1109, SEQ ID NO: 1110, SEQ ID 
NO: 1111, SEQ ID NO: 1112, SEQ ID NO : 1113, SEQ ID NO: 1114, 

2140 SEQ ID NO: 1115, SEQ ID NO: 1116, SEQ ID NO: 1117, SEQ ID 
NO: 1118, SEQ ID NO: 1119, SEQ ID NO: 1120, SEQID NO: 1121, 
SEQ ID NO: 1122, SEQ ID NO: 1123, SEQ ID NO : 1124, SEQ 
IDNO : 1125, SEQ ID NO: 1126, SEQ ID NO: 1127, SEQ ID NO: 
1129, SEQ ID NO: 1130, SEQ ID NO: 1131, SEQ ID NO: 1132, 

2145 SEQ ID NO: 1133, SEQ ID NO: 1134, SEQ ID NO: 1135, SEQ ID 
NO: 1136, SEQ ID NO: 1137, SEQ ID NO: 1138, SEQ. ID NO: 1139, 
SEQ ID NO: 1140, SEQ ID NO : 1141, SEQ ID NO: 1.142, SEQ 
IDNO : 1143, SEQ ID NO : 1 144, SEQ ID NO: 1145, SEQ ID NO: 
1146, SEQ. ID NO: 1147, SEQ ID NO: 1148, SEQ ID NO: 1149, 

2150 SEQ ID NO: 1150, SEQ ID NO: 1151, SEQ. ID NO: 1152, SEQ, ID 
NO : 1153, SEQ ID NO : 1154, SEQ ID NO: 1155, SEQ ID NO : 1156, 
S E Q I D N 0 : 1 158, SEQ ID NO: 1 159, S E Q I D N 0 : 1160, S E Q I D 
NO: 1161, SEQ ID NO: 1162, SEQ ID NO: 1163, SEQ. ID NO: 1164, 
SEQ ID NO: 1165, SEQ ID NO : 1166, SEQ ID NO: 1167, SEQ ID 

2155 NO: 1168, SEQ ID NO : 11 69, SEQ ID NO: 1170, SEQ, ID NO: 1171, 
SEQ ID NO: 1172, SEQ ID NO : 11 73, SEQ ID NO: 1174, SEQ. ID 
NO: 1175, SEQ ID NO: 1176, SEQ ID NO: 1177, SEQID NO: 1178, 
SEQ ID NO: 1179, SEQ ID NO: 1180, SEQ ID NO : 1181, SEQ 
IDNO : 1182, SEQ ID NO: 1183, SEQ ID NO: 1184, SEQ ID NO: 

2160 1185, SEQ ID NO: 1186, SEQ ID NO: 1187, SEQ. ID NO: 1188, 
SEQ ID NO: 1189, SEQ ID NO: 1190, SEQ ID NO: 1192, SEQ ID 
NO: 1193, SEQ ID NO: 1194, SEQ ID NO: 1195, SEQ ID NO: 1196, 
SEQ ID NO: 1197, SEQ ID NO: 1198, SEQ ID NO : 1199, SEQ 
IDNO: 1200, SEQ. ID NO: 1201, SEQ ID NO: 1202, SEQ ID NO: 
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2165 1203, SEQ ID NO: 1204, SEQ ID NO: 1205, SEQ ID NO: 1206, 
SEQ ID NO: 1207, SEQ ID NO: 1208, SEQ ID NO: 1209, SEQ ID 
NO: 1210, SEQ ID NO: 1211, SEQ ID NO: 1213, SEQ ID NO: 1214, 
SEQ ID NO: 1215, SEQ ID NO:1216, SEQ ID NO : 1217, SEQID 
NO: 1218, SEQ ID NO: 1219, SEQ ID NO: 1220, SEQ ID NO: 1221, 

2170 SEQ ID NO: 1222, SEQ ID NO: 1223, SEQ ID NO: 1224, SEQ ID 
NO: 1225, SEQ ID NO: 1226, SEQ ID NO: 1227, SEQ. ID NO: 1228, 
SEQ ID NO: 1229, SEQ ID NO : 12 30, SEQ ID NO: 1231, SEQ ID 
NO: 1232, SEQ ID NO : 1233, SEQ ID NO: 1234, SEQID NO: 1235, 
SEQ ID NO: 1236, SEQ ID NO : 12 37, SEQ ID NO: 1238, SEQ 

2175 IDNO: 1239, SEQ ID NO: 1241, SEQ ID NO: 1242, SEQ ID NO: 
1243, SEQ ID NO: 1244, SEQ ID NO: 1245, SEQ ID NO: 1246, 
SEQ ID NO: 1247, SEQ ID NO: 1248, SEQ ID NO: 1249, SEQ ID 
NO: 1250, SEQ ID NO: 1251, SEQ ID NO: 1252, SEQ ID NO: 1253, 
SEQ ID NO: 1254, SEQ ID NO : 1255, SEQ ID NO : 1256, SEQ 

2180 IDNO: 1257, SEQ ID NO: 1259, SEQ ID NO: 1260, SEQ. ID NO: 
1261, SEQ ID NO: 1262, SEQ ID NO: 1263, SEQ, ID NO: 1264, 
SEQ ID NO: 1265, SEQ ID NO : 1266, SEQ ID NO: 1267, SEQ ID 
NO: 1268, SEQ ID NO: 1269, SEQ ID NO: 1270, SEQ ID NO: 1271, 
SEQ ID NO: 1272, SEQ ID NO: 1273, SEQ. ID NO: 1275, SEQID 

2185 NO : 1276, SEQ. ID NO : 1277, SEQ ID NO : 1278, SEQ. ID NO : 1279, 
SEQ ID NO: 1280, SEQ ID NO : 1281, SEQ ID NO: 1282, SEQ ID 
NO: 1283, SEQ ID NO: 1284, SEQ ID NO: 1285, SEQ. ID NO: 1286, 
SEQ ID NO: 1287, SEQ ID NO : 1289, SEQ ID NO: 1290, SEQ ID 
NO: 1291, SEQ ID NO: 1292, SEQ ID NO: 1293, SEQID NO: 1294, 

2190 SEQ ID NO: 1295, SEQ ID NO: 1296, SEQ ID NO: 1297, SEQ 
IDNO: 1298, SEQ. ID NO: 1299, SEQ ID NO: 1300, SEQ ID NO: 
1301, SEQ ID NO: 1303, SEQ ID NO: 1304, SEQ ID NO: 1305, 
SEQ ID NO: 1306, SEQ ID NO: 1307, SEQ ID NO: 1308, SEQ ID 
NO: 1310, SEQ ID NO: 1311, SEQ. ID NO: 1312, SEQ. ID NO: 1313, 

2195 SEQ ID NO: 1314, SEQ. ID NO: 1315, SEQ. ID NO: 1316, SEQ 
IDNO: 1317, SEQ. ID NO: 1318, SEQ ID NO: 1319, SEQ ID NO: 
1320, SEQ ID NO: 1322, SEQ. ID NO: 1323, SEQ ID NO: 1324, 
SEQ ID NO: 1325, SEQ ID NO: 1326, SEQ. ID NO: 1327, SEQ ID 
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NO: 1328, SEQ ID NO : 1330, SEQ ID NO : 1331, SEQ ID NO: 1332, 

2200 SEQ ID NO: 1333, SEQ ID NO: 1334, SEQ ID NO : 1335, SEQ ID 
NO: 1336, SEQ ID NO: 1337, SEQ ID NO: 1339, SEQ ID NO: 1340, 
SEQ ID NO: 1341, SEQ ID NO: 1342, SEQ ID NO: 1343, SEQ ID 
NO: 1344, SEQ ID NO: 1345, SEQ ID NO: 1346, SEQ ID NO: 1347, 
SEQ ID NO: 1349, SEQ ID NO: 1350, SEQ ID NO: 1351, SEQ ID 

2205 NO: 1352, SEQ ID NO: 1353, SEQ ID NO: 1354, SEQID NO: 1355, 
SEQ ID NO: 1356, SEQ ID NO: 1357, SEQ. ID NO: 1358, SEQ 
IDNO: 1360, SEQ ID NO: 1361, SEQ ID NO: 1362, SEQ ID NO: 
1363, SEQ ID NO: 1364, SEQ ID NO: 1365, SEQ ID NO: 1367, 
SEQ ID NO: 1368, SEQ ID NO: 1369, SEQ ID NO: 1370, SEQ ID 

2210 NO: 1371, SEQ ID NO: 1375, SEQ ID NO: 1376, SEQ ID NO: 1377, 
SEQ ID NO: 1378, SEQ ID NO: 1379, SEQ ID NO: 1381, SEQ 
IDNO : 1382, SEQ ID NO: 1383, SEQ ID NO: 1384, SEQ ID NO: 
1385, SEQ ID NO: 1387, SEQ ID NO: 1388, SEQ. ID NO: 1389, 
SEQ ID NO: 1390, SEQ ID NO: 1391, SEQ ID NO: 1392, SEQ ID 

2215 NO: 1393, SEQ ID NO: 1395, SEQ ID NO: 1396, SEQ ID NO: 1397, 
SEQ ID NO: 1398, SEQ ID NO: 1399, SEQ ID NO: 1400, SEQID 
NO: 1402, SEQ ID NO : 1403, SEQ ID NO: 1404, SEQ ID NO: 1405, 
SEQ ID NO: 1406, SEQ. ID NO: 1407, SEQ. ID NO: 1409, SEQ ID 
NO: 1410, SEQ. ID NO : 14 12, SEQ ID NO: 1413, SEQ. ID NO: 1414, 

2220 SEQ ID NO: 1415, SEQ ID NO: 1416, SEQ ID NO: 1417, SEQ ID 
NO: 1419, SEQ ID NO: 1420, SEQ ID NO: 1421, SEQID NO: 1422, 
SEQ ID NO : 1423, SEQ ID NO: 1424, SEQ. ID NO: 1425, SEQ 
IDNO: 1427, SEQ ID NO: 1428, SEQ ID NO: 1429, SEQ, ID NO: 
1430, SEQ ID NO: 1431, SEQ ID NO: 1432, SEQ ID NO: 1433, 

2225 SEQ ID NO: 1434, SEQ ID NO: 1435, SEQ ID NO: 1437, SEQ ID 
NO: 1438, SEQ, ID NO: 1439, SEQ ID NO: 1440, SEQ ID NO: 1441, 
SEQ ID NO: 1442, SEQ ID NO: 1444, SEQ ID NO : 1445, SEQ 
IDNO: 1446, SEQ ID NO: 1447, SEQ ID NO: 1448, SEQ. ID NO: 
1449, SEQ ID NO: 1451, SEQ ID NO: 1452, SEQ. ID NO: 1453, 

2230 SEQ ID NO: 1454, SEQ ID NO : 1455, SEQ ID NO: 1456, SEQ ID 
NO: 1458, SEQ. ID NO: 1459, SEQ. ID NO: 1461, SEQ ID NO: 1462, 
SEQ ID NO: 1463, SEQ ID NO: 1464, SEQ ID NO: 1465, SEQID 
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NO: 1466, SEQ ID NO: 1468, SEQ ID NO: 1469, SEQ ID NO: 1470, 
SEQ ID NO: 1472, SEQ ID NO: 1474, SEQ ID NO: 1475, SEQ ID 

2235 NO : 1476, SEQ ID NO: 1477, SEQ ID NO : 1479, SEQ ID NO : 1480, 
SEQ ID NO: 1481, SEQ ID NO: 1482, SEQ ID NO: 1483, SEQ ID 
NO: 1484, SEQ ID NO: 1485, SEQ ID NO: 1486, SEQII) NO: 1488, 
SEQ ID NO: 1490, SEQ ID NO : 1491, SEQ ID NO : 1492, SEQ 
IDNO: 1493, SEQ ID NO: 1495, SEQ ID NO: 1496, SEQ. ID NO: 

2240 1497, SEQ ID NO: 1498, SEQ ID NO: 1500, SEQ ID NO: 1502, 
S E Q I D N 0 : 1 5 0 3 , S E Q I D NO: 150 4 , S E Q [ D N 0:1505, SEQ ID 
NO: 1507, SEQ ID NO : 1509, SEQ ID NO: 1512, SEQ ID NO: 1513, 
SEQ ID NO: 1514, SEQ ID NO: 1515, SEQ ID NO: 1517, SEQ 
IDNO: 1518, SEQ ID NO: 1519, SEQ ID NO: 1521, SEQ ID NO: 

2245 1522, SEQ ID NO: 1523, SEQ ID NO: 1524, SEQ ID NO: 1525, 
SEQ ID NO: 1527, SEQ ID NO: 1528, SEQ ID NO: 1529, SEQ ID 
NO: 1530, SEQ ID NO: 1531, SEQ ID NO: 1533, SEQ ID NO: 1534, 
SEQ ID NO: 1535. SEQ ID NO: 1536, SEQ ID NO: 1538, SEQID 
NO: 1539, SEQ ID NO: 1541, SEQ ID NO: 1542, SEQ ID NO: 1543, 

2250 SEQ ID NO: 1544, SEQ ID NO : 1 54 6, SEQ ID NO: 1548, SEQ ID 
NO: 1550, SEQ ID NO: 1552, SEQ ID NO: 1554, SEQ ID NO: 1556, 
SEQ ID NO: 1557, SEQ ID NO : 1559, SEQ ID NO: 1560, SEQ ID 
NO: 1561, SEQ. ID NO : 1562, SEQ ID NO: 1564, SEQID NO: 1565, 
SEQ ID NO: 1567, SEQ ID NO: 1568, SEQ ID NO: 1570, SEQ 

2255 IDNO: 1572, SEQ ID NO: 1573, SEQ ID NO: 1574, SEQ. ID NO: 
1575, SEQ ID NO: 1577, SEQ ID NO: 1578, SEQ ID NO: 1579, 
SEQ ID NO: 1581, SEQ ID NO: 1582, SEQ ID NO: 1583, SEQ ID 
NO: 1585, SEQ ID NO: 1586, SEQ ID NO: 1588, SEQ ID NO: 1589, 
SEQ ID NO: 1590, SEQ ID NO: 1592, SEQ ID NO: 1593, SEQ 

2260 IDNO: 1595, SEQ ID NO: 1597, SEQ ID NO: 1598, SEQ ID NO: 
1600, SEQ ID NO: 1602, SEQ ID NO: 1606, SEQ. ID NO: 1608, 
SEQ ID NO: 1609, SEQ ID NO: 1610, SEQ. ID NO: 1611, SEQ ID 
NO: 1613, SEQ ID NO: 1614, SEQ ID NO: 1616, SEQ ID NO: 1618, 
SEQ ID NO: 1620, SEQ ID NO: 1621, SEQ ID NO: 1623, SEQID 

2265 NO: 1625, SEQ. ID NO: 1628, SEQ ID NO: 1630, SEQ ID NO: 1631, 
SEQ ID NO: 1633, SEQ ID NO: 1634, SEQ. ID NO: 1638, SEQ ID 
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NO: 1641, SEQ ID NO: 1642, SEQ ID NO: 1644, SEQ ID NO: 1645, 
SEQ ID NO: 1647, SEQ ID NO: 16 48, SEQ ID NO: 1650, SEQ ID 
NO: 1651, SEQ ID NO: 1653, SEQ ID NO: 1654, SEQID NO: 1656, 

2270 SEQ ID NO: 1657, SEQ ID NO: 1659, SEQ ID NO: 1661, SEQ 
IDNO:1663, SEQ ID NO: 1665, SEQ ID NO: 1667, SEQ ID NO: 
1671, SEQ ID NO: 1674, SEQ ID NO: 1676, SEQ ID NO: 1678, 
SEQ ID NO: 1679, SEQ ID NO: 1681, SEQ ID NO: 1684, SEQ ID 
NO: 1686, SEQ ID NO: 1687, SEQ ID NO: 1689, SEQ ID NO: 1692, 

2275 SEQ ID NO: 1693, SEQ ID NO: 1695, SEQ ID NO: 1697, SEQ 
IDNO: 1698, SEQ ID NO: 1702, and SEQ ID NO: 1703 

, or (b) an amino acid sequence in the amino acid 
sequences set forth in (a) in which several amino acids are 
deleted, replaced or added. 

2280 [0014] 

The nucleic- acid molecule specific to enterohemorrhagic 
pathogenic-E. coli 0 - 1 5 7 : H 7 of the present invention, a gene 
included in the nucleic-acid molecule and a protein or a 
polypeptide encoded by the gene are found by determining all 

2285 nucleotide sequences on the chromosome of 0-157 : H7 SAKAI 
and identifying a region and a nucleotide sequence specific to 
0-157:H7 which are absent from nonpathogenic E. coli K-12. 
T h e c h r o m o s o m a 1 n u c 1 e o t i d e seque n c e s o f 0 -'157 : H7 d e t e r m i n e d 
by the present invention have been registered on June 26, 2000, 

2290 as Accession No. BA000007 in GenBank DDBJ. 
[0015] 

Furthermore, after the registration of the whole 
chromosomal nucleotide sequence of 0 - 1 5 7 : H 7 based on the 
present invention, close similar nucleotide sequences to those 

2295 of the present invention was registered on October 22, 2000 
(GenBank/AE005 1 7H). However, when these sequences were 
registered, the sequences had two gaps and 2600 or more 
characters other than AGCT (undetermined base). Thus the 
sequences were imperfect. In addition, although the data 

2300 thereof has been updated on September 25, 2001 and October 26, 
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2001, merely one gap sequence was determined and 2 600 or 

more undetermined bases were remained. 

[0016] 

In addition, as to obtained genetic information, homology 
2305 search and prediction of predictive ORF and function thereof 
may be performed by comparison of the amino acid sequence to 
all sequence found in GenBank, DDBJ, SWISS-PROT and PIE 
database using an algorithm known in the art, for example, 
BLAST algorithm and the like. 
2310 [0017] 

The O- 157:H7 specific polypeptides of the present 
invention are proteins or polypeptides having a character set 
forth in the tables described below. From the information of 
amino acid sequence, the polypeptides are classified to the 

2315 following groups: l) Proteins having unknown function etc., 2) 
Proteins which have unknown function, but have significant 
homology to that of other bacteria, 3) Proteins comprising 
Insertion Sequence; IS, 4) Proteins derived from phage, 5) 
Regulatory element, 6) Proteins relating to fimbriae, 7) 

2320 Proteins relating to transportation of substance, 8) Proteins 
relating to synthesis of lipopoly saccharide, 9) Proteins 
relating to metabolism, 10) Proteins processing DNA/RNA, 11) 
Proteins relating pathogenicity, 12) Other roteins. 
[00181 

2325 List: polypeptides specific to enterohemorrhagic pathogenic-E. 
coli 0-157:H7 

l) Proteins having a novel function 

Sequence number: Hydrophobieity, The number of amino acids, 
Charact er such as function 
2330 SEQ ID NO: 143: 0.610526, 39, novel 

SEQ ID NO: 1438: "0.041667, 109, novel 

SEQ ID NO: 1439: -0.505392, 817, an outer membrane usher 
protein precursor, similar to outer membrane usher protein 
precursores, for example ,YehB [Escherichia coli K-12] 
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2335 gi j 465572 | sp | P33341 ! YEHB#ECOLI (58% identity in the 
amino acids) 

SEQ ID NO: 1440: -0.23304, 228, a putative fimbrial chaperone, 
similar to fimbrial chaperone, for example, YehC [Escherichia 
coli] gi I 465573 ! sp | P33342 | YEHC#ECOLI (56% identity in 221 

2340 amino acids), GTG start 

SEQ ID NO: 1441 : -0.121469, 178, a fimbrial major protein, 
similar to YehD [Escherichia coli] 

gi j 465574 | sp | P33343 | YEHD#ECOLI (26% identity in 
17 7 amino acids), and similar to long polar fimbrial major 

2345 proteins [Salmonella typhimurium] 

gi | 1170815 | sp | P43660 | LPFA#SALTY (25% identity in 175 
amino acids) 

SEQ ID NO: 1442: -0.445877, 474, novel 

SEQ ID NO: 1702: -0.448052, 78, similar to F plasmid CcdA 
2350 protein (LetA protein) [Escherichia coli] 

gi I 9507755 | ref | NP#061421.1 (30% identity in 70 amino acids) 
SEQ ID NO: 1703: 0.210577, 105, similar to F plasmid CcdB 
protein (LetB protein) [Escherichia coli] 

gi | 950 7 756 i ref | NP#0 6 1422.1 (35% identity in 104 amino acids) 
2355 SEQ ID NO: 1663: '0.478836, 190, similar to YABP#ECOLI 
gi I 2506583 ! sp | P39220 (38% identity in 168 amino acids) 
SEQ ID NO: 1387: 0.060434, 370, a fimbrial protein, similar to 
putative putative fimbrial proteins, for example, [Escherichia 
coli] gi | 538781 | pir | | B47152 (27% identity in the amino acids), 
2360 and long polar fimbrial minor protein LpfE [Salmonella 
typhimurium] gi | 1170819 | sp | P43664 | LPFE#SALTY (27% 
identity in 157 amino acids) 

SEQ ID NO: 1388: -0.140816, 197, a putative fimbrial protein, 
similar to putative fimbrial protein YadK [Escherichia coli] 
2365 gi ! 549488 I sp | P37016 i YADK#ECOLI (40% identity in 190 
amino acids) 

SEQ ID NO: 1389: -0.034826, 202, a putative fimbrial protein, 
similar to putative fimbrial protein YadL [Escherichia coli] 
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gi | 549489 ! sp | P37017 | YADL#ECOLI (41% identity in 192amino 
acids) 

SEQ ID NO: 1390: -0.011828, 187, a putative fimbrial protein, 
similar to putative fimbrial-like protein YadM [Escherichia coli] 
gi | 549490 | sp | P37018 | YADM#ECOLI (49% identity in 173 
amino acids) 

SEQ ID NO: 1391 : -0.387529, 867, similar to HTRE#ECOLI 
gi I 1786332 (60% identity in 849amino acids) [a putative outer 
membrane porin protein] 

SEQ ID NO: 1392: -0.250623, 242, similar to ECPD#ECOLI 
gi | 1786333 (60% identity in 239 amino acids) [a putative pilin 
chaperonej 

SEQ ID NO: 1393: 0.058586, 199, similar to YADN#ECOLI 
gi | 1786334 (39% identity in 195 amino acids) [a putative 
fimbrial-like protein] 

SEQ ID NO: 979: -0.333674, 99, novel 

SEQ ID NO: 980: "0.245638, 150, novel 

SEQ ID NO: 981 : -0.622325, 216, novel, TTG start 

SEQ ID NO: 982: -0.842466, 74, novel 

SEQ ID NO: 983: -0.172956, 160, novel, similar to hypothetical 
44.2kD protein YhhZ [Escherichia eoli (strain K- 12)] 
gi | 1176284 | sp | P46855 | YHHZ#ECOLI (38% identity in 148 
amino acids): and hemolysin-coregulated protein Hep [Vibrio 
cholerae] gi | 7467495 | pir | | T10891 (32% identity in 149 amino 
acids) 

SEQ ID NO: 984: -0.448614, 470, novel 

SEQ ID NO: 985 : -0.402126, 1036, novel, similar to IcmF 
protein [Legionella pneumophila] gi | 7465644 j pir j j T18341 
(20% identity in 1037 amino acids) 



SEQ ID NO: 986 
SEQ ID NO: 987 
SEQ ID NO: 988 
SEQ ID NO: 989 
SEQ ID NO: 990 



0.637097, 63, novel, GTG start 
-0.321591, 265, novel, GTG start 
-0.206311, 207, novel 
0.001619, 248, novel 

-0.129036, 924, a putative ATP- dependent Clp 
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protease ATP-binding chain, similar to ATP-dependent Clp 
protease ATP-binding chain, for example, ClpB, 
2405 gi | 7428220 \ pir j | T07807, (40% identity in 753 amino acids) 

SEQ ID NO: 991: -0.11502, 254, novel [a putative membrane 
protein; IMP] 

SEQ ID NO: 992: -0.345146, 444, novel, its Oterminal part is 
similar to hypothetical protein z2 9f [Vibrio choleraej 
2410 gi | 3341578 ! emb j Caal 3133.1 | (51% identity in 104 amino acids) 
SEQ ID NO: 993 : -0.308046, 175, novel [a hypothetical 
lipoprotein] 

SEQ ID NO: 994: -0.442019, 427, novel 
SEQ ID NO: 995: -0.298333, 361 , novel 

2415 SEQ ID NO: 996: -0.314935, 617, novel 

SEQ ID NO: 997: -0.648175, 138, novel, similar to base plate 
proteins and aeidiclysozymes [coliphage T4] 

gi I 137980 | sp | P09425 | VG25#BPT4 (34% identity in 62 amino 
acids) (at low level) 

2420 SEQ ID NO: 998 : -0.380777, 464, novel, similar to 
hypothetical 54.5 kDa protein [Edwardsiella ictaluri] 
gi | 2708666 i gb j aaB92576.1 j (41% identity in 461 amino acids) 
SEQ ID NO: 999: 0.109459, 75, novel 

SEQ ID NO: 1000 : -0.366868, 167, novel, similar to a 
2425 hypothetical protein [Escherichia coli] 

gi I 2920642 | gb | aaC32477.1 j (99% identity in 166 amino acids); 
and a hypothetical 19.5 kDa protein [Edwardsiella ictaluri] 
gi | 2708667 I gb j aaB92577.1 j (32% identity in 148 amino acids) 
SEQ ID NO: 1001 : -0.39593, 173, novel 
2430 SEQ ID NO: 1002: -0.16, 46, novel 

SEQ ID NO: 1003 : -0.416269, 714, novel, similar to VgrG 
proteins, for example, [Escherichia coli strain ecll] 
gi | 2920640 | gb | aaC32475.1 | (98% identity in 713 amino acids) 
SEQ ID NO: 1004: -0.707907, 1405, an Rhs protein, similar to 
2435 RhsH protein, for example, [Escherichia coli strain EC45] 
gi | 2920634 j gb j aaC32471.1 j (92% identity in 1264 amino acids) 
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SEQ ID NO: 1005 : -0.704433, 204, novel, similar to YbeQ 
[Escherichia coli] gi [ 3025010 | sp | P77234 | (23% identity in 172 
amino acids); and YihG [Escherichia coli] 

2440 gi j 418454 | sp | P32106 | YIBG#ECOLI (30% identity in 89 amino 
acids) 

SEQ ID NO: 1006: -0.305, 61, novel 

SEQ ID NO: 1007 : 1.333333, 97, novel [a hypothetical 
membrane protein; IMP] 
2445 SEQ ID NO: 1008 : -0.33836, 379, novel, similar to H 
repeat-associated proteins, for example, [Escherichia coli RhsB 
element] gi | 140772 | sp I P28912 | (97% identity in 378 amino 
acids) 

SEQ ID NO: 1009: -0.746417, 587, an Rhs protein, similar to 
245 0 Rhs core proteins, for example, RhsE [Escherichia coli] 
gi | 2507113 | sp | P24211 | RHSE#ECOLI, TTG start 
SEQ ID NO: 1010: 0.701786, 57, novel, similar to N-terminal 
part of hypothetical protein, for example, ORE E2 in Rhs 
element [Escherichia coli] gi | 2851489 | sp | P31991 | (92% 
245 5 identity in 56 amino acids) 

SEQ ID NO: 1011 : -0.614943, 88, novel, similar to Oterminal 
part of hypothetical protein, for example, ORE E2 in Rhs 
element [Escherichia coli] gi | 2851489 | sp | P31991 | (99% 
identity in 108amino acids) 
2460 SEQ ID NO: 1012 : -0.31718, 391, novel, similar to II 
repeat-associated proteins, for example, [Escherichia coli RhsB 
element] gi | 7465875 | pir | j E64898 (58% identity in 372 amino 
acids), GTG start 

SEQ ID NO: 1094: -0.673765, 325, a putative integrase, similar 
2465 to integrases, for example, [Shigella flexneri bacteriophage V] 
gi j 2465477 | gb | aaB72135.1 j (88% identity in 305 amino acids) 
SEQ ID NO: 1095 : -1.175308, 82, a transcription 
antitermination protein, partially similar to transcription 
antitermination protein N [Bacteriophage lambda] 
2470 gi | 73111 | pir j i VNBPL, (90% identity in 42 amino acids), may 
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be disrupted 

SEQ ID NO: 1098: -0.473644, 130, novel, similar to N-terminal 
part of hypothetical protein HP1334 [Helicobacter pylori (strain 
26695)] gi j 7464516 i pir i | F64686 (36% identity in 111 amino 
2475 acids); and N-terminal part of hypothetical protein [Neisseria 
meningitidis] gi | 6900422 | emb | CAB72032.1 | (31% identity in 
113 amino acids) 

SEQ ID NO: 1097: "0.28903, 238, a prophage repressor CI, 
similar to prophage repressor CI, for example, [Bacteriophage 
2480 HK97] gi | 6901 592 | gb | aaF31095. 1 | AF069529#8 (AF069529) 
(99% identity in 237 amino acids) 

SEQ ID NO: 1098: -0.488364, 67, a Cro repressor, identical to 
regulatory protein Cro [phage lambda] gi | 73101 | pir | | RCBPL; 
and similar to Cro protein, for example , [Bacteriophage HK97] 
2485 gi | 6901626 | gb | aaF31129.1 I (98% identity in 66amino acid) 

SEQ ID NO: 1099: -0.309278, 98, a regulatory protein ell, 
identical to regulatory protein ell [Bacteriophage lambda] 
gi| 73106 | pir | | QCBP2L 

SEQ ID NO: 1100: -0.622772, 203, a phage replication protein, 
2490 similar to N-terminal part of phage replication protein, for 
example, O protein [Bacteriophage lambda] 

gi | 75891 | pir | | OR B P L ( 8 8 % i d e n t i t y i n 1 6 3 a m i n o a c i d s ) , 
interrupted by frameshift 

SEQ ID NO: 1101 : -0.811784, 171, a phage replication protein, 
2495 similar to C'terminal part of replication protein, for example, 
protein O [Bacteriophage lambda] gi | 75891 | pir | | ORBPL (98% 
identity in 188 amino acids) , interrupted by frameshift 
SEQ ID NO: 1102: -0.002913, 104, a replication protein, its 
N-terminal part (amino acids at the position 1-21) is identical 
2500 to replication protein P, for example, [Bacteriophage lambda] 
gi I 75893 | pir | | PQBPL, probably disrupted 

SEQ ID NO: 1103: -0.026894, 265, a putative tail fiber protein, 
partially similar to tail fiber proteins, for example, 
[Bacteriophage HK97] gi ! 6901 608 j gb j aaF3 1 1 11 . 1 ! (AF069529) 



Appendix B: Hideo et at. Full Translation 

2505 (42% identity in 155 amino acids); and similar to Sc/SvQ 
protein (DNA inversion product) [Escherichia coli plasmid 
ploB], for example, gi | 96420 | pir | | S 18690 (45% identity in 159 
amino acids) 

SEQ ID NO: 1104: -0.33198, 198, novel, similar to hypothetical 
2510 proteins, for example, YcfA protein [Escherichia coli] 
gi ! 2506641 j sp j P09153 | YCFA#ECOLI (65% identity in 196 
amino acids); Gp29 [Bacteriophage H K 9 7 1 

gi j 6901609 | gh | aaF311.12.1 | (66% identity in 192 amino acids); 
and T protein [Escherichia coli plasmid pi 5B] 

2515 gi | 96096 | pir | | S18684 (55% identity in 184 amino acids) 

SEQ ID NO: 1105 : -0.586394, 148, novel, similar to hypothetical 
proteins, for example, YfdK [Escherichia coli(strain K-12)] 
gi | 3915468 | sp | P77656 | YFDK#ECOLI (68% identity in 144 
amino acids) 

2520 SEQ ID NO: 1106: -0.114706, 137, a putative tail fiber protein, 
similar to hypothetical proteins, for example, YfdL [Escherichia 
coli (strain K-12)] gi | 2495635 | sp | P76508 | YFDL#ECOLI (52% 
identity in 67 amino acids); and putaive tail fiber protein YcfE 
[Escherichia coli cryptic prophage el4] 

2525 gi | 7444558 | pir | | B64861 (51% identity in 45 amino acids) 

SEQ ID NO: 1107: -0.234783, 185, a DNA-invertase, similar to 
DNA-invertases, for example, Pin [Escherichia coli] 
gi | 72978 | pir | j JWEC (96% identity in 184 amino acids) 
SEQ ID NO: 1108: -0.386771, 258, novel, similar to hypothetical 

2530 protein [Deinococcus radiodurans (strain Rl)] 

gi | 7472205 I pir | | B75431 (32% identity in 249 amino acids) 
SEQ ID NO: 1109: 0.763265, 50, novel 

SEQ ID NO: 1110 : 0.052227, 248, a putative transcription 
regulatory element, similar to transcription regulatory 
2535 elements, for example. , putative AraOtype regulatory protein 
YdeO gi | 6.176587 | sp | P76135 | YDEO#ECOLI (34% identity in 
247 amino acids) 

SEQ ID NO: 1111 : -0.741026, 118, novel, similar to C-terminal 
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part of hypothetical protein, for example, [Escherichia coli 
2540 insertion sequence IS2] gi | 140808 | sp | P19777 | YI22#ECOLI 
(77% identity in 113 amino acids), may be disrupted 
SEQ ID NO: 1112: -0.510941, 394, a putative integrase, similar 
to integrases, for example, [phage phi-R73.l 

gi | 93827 | pir | | A42465 (61% identity in 388 amino acid) 
2545 SEQ ID NO: 1113: "0.468841, 139, novel, GTG start 
SEQ ID NO: 1114: "0.227805, 206, novel 
SEQ ID NO: 1115: -0.045395, 153, novel 
SEQ ID NO: 1116: -0.460952, 211, novel 

SEQ ID NO: 1117 : -0.462755, 197, novel, similar to 
2550 hypothetical protein PFB0765w [malaria parasite! 

gi | 7494317 | pir | | E71606 (24% identity in 193 amino acids) (at 
low level), TTG start 

SEQ ID NO: 1118: -0.432979, 189, novel 

SEQ ID NO: 1119 : -0.854445, 91, a putative transcription 
2555 activator, similar to Ogr family, for example, LsrS 
[Rahnellaaquatilis] gi | 93826 | pir | | E42465 (41% identity and 
65 amino acids); and delta protein [phage phi-R73] 
gi | 93826 | pir | | E42465 (36% identity in 76 amino acids) 
SEQ ID NO: 1120 : -0.291803, 184, a putative polarity 
2560 suppression protein (amber mutation- suppression); similar to 
Psu-like proteins, for example. Psu [Bacteriophage P4] 
gi I 1351414 ! sp I P05460 | VPSU#BPP4 (30% identity in 166 
amino acids) 

SEQ ID NO: 1121 : -0.4748, 251, a head size determination 
2565 [protein], similar to head size determination proteins, for 
example, Sid [phage phi-R73] gi | 93821 | pir | | F42465 (22% 
identity in 236 amino acids) 

SEQ ID NO: 1122 : -0.126744, 87, a putative DNA binding 
protein, similar to hypothetical proteins, for example, putative 
2570 DNAbinding protein ORF88 [satellite phage P4] 
gi j 140147 I sp | P12552 | Y9K#BPP4 (65% identity in 82 amino 
acids) 
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SEQ ID NO: 1123: 0.40973, 186, a CI phage repressor, similar 
to CI repressors, for example, [ Bacteriophage P4] 
2575 gi j 1262833 | emb | Caa35902.1 | (67% identity in 115 amino 
acids) 

SEQ ID NO: 1124: -0.149315, 74, novel. 

SEQ ID NO: 1125 : 0.202804, 108, a putative copy number 
control protein, similar to orf 106 [satellite phage P4| 

2580 gi | 75896 | pir | j QQBPP4 (71% identity in 98amino acids) 

SEQ ID NO: 1126: -0.193179, 778, a putative DNA primase, 
similar to DNA primases, for example, alpha gene product 
[satellite phage P4j gi | 130905 | sp | P10277 | PRIM#BPP4 (72% 
identity in 770 amino acids) 

2585 SEQ ID NO: 1127 : -0.333019, 319, novel, similar to 
hypothetical protein 111401 [Synechocystis sp. (strain PCC 
6803)] gi | 7470073 | pir | | S74462 (21% identity in 206 amino 
acids), GTG start 

SEQ ID NO: 1451 : 0,23625, 241, a putative oxidoreductase, 
2590 similar to oxidoreductases, for example, [Streptomyces 
coelicolor A3(2)] gi | 6137024 | emb | CAB59579.1 | (55% identity 
in 237 amino acids) 

SEQ ID NO: 1452: 0.520652, 93, novel [hypothetical membrane 
protein; IMP] 

2595 SEQ ID NO: 1453: 0.246154, 53, novel 

SEQ ID NO: 1454: -0.246667, 301, a putative transcription 
regulatory element (LysR family), similar to transcription 
regulatory elements , for example , [Xylella fastidiosa] 
gi | 9106842 | gb | aaF84577.1 | AE003999#5 (40% identity in 

2600 290amino acids) 

SEQ ID NO: 1455 : -0.309788, 379, novel, similar to 
hypothetical protein, for example , [Pseudomonas aeruginosa] 
gi ! 732227 | sp | Q01609 | YODE#PSEAE (54% identity in 
3 7 6am in o acids) 

2605 SEQ ID NO: 1456 : 0.996977, 398, a putative transporter 
protein, similar to transporters, for example, OpdE 
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[Pseudomonas aeruginosa] 
gi | 400678 | sp | Q01.602 | OPDE#PSEAE (60% identity in 
3 9 6am in o acid) 

2610 SEQ ID NO: - : 0.215625, 97, novel 

SEQ ID NO: 1577 : -0.388722, 134, novel, similar to 
hypothetical proteins, for example, LOO 13 [Escherichia coli 
0-157:H7 strain EDL933]] gi | 3414881 I gb | aaC31492.1 | (99% 
identity in 133 amino acids), GTG start 

2615 SEQ ID NO: 1578: 0.010435, 116, novel, similar to hypothetical 
protein, for example, L0014 [Escherichia coli 0-157:H7 strain 
EDL933]] gi | 3288157 | emb | Caall510.1 | (100% identity in 115 
amino acids) 

SEQ ID NO: 1579 : -0.445312, 513, novel, similar to 
2620 hypothetical proteins, for example, L00 15 [Escherichia coli 
0-157 = H7 strain EDL933]] gi | 3414883 I gb | aaC31494.1 | (100% 
identity in 512 amino acids) 

SEQ ID NO: • : -0.171316, 381, a putative NADH-dependent 
flavin oxidoreductase, similar to YqiG [Bacillus subtilis] 
2625 gi | 1731054 | sp | P54524 | YQIG#BACSU (40% identity in 380 
amino acids) 

SEQ ID NO: 1495 : -0.089543, 307, novel, similar to 
hypothetical proteins, for example, [Escherichia coli K- 12] 
gi | 3183244 | sp I P76049 | YCJY#ECOLI (40% identity in 294 

2630 amino acids) [in Tpx-Fnr intergenic region] 

SEQ ID NO: 1496: -0.058117, 309, a putative transcription 
regulatory element, similar to transcription regulatory 
elements, for example, ! Escherichia coli] 

gi | 2495398 | sp | P75836 | YCAN#ECOLI (38% identity in 291 

2635 amino acids) [in DmsOPflA intergenic region] 
SEQ ID NO: 1497: -0.218644, 119, novel 

SEQ ID NO: 1498: -0.25445, 192, a putative oxidoreductase, 
similar to N- terminal part of oxidoreductase [aldo/keto 
reductase family] (amino acids at the position 5-192/286), and 
2640 similar to [Thermotoga maritimal gi j 7431104 | pir |j A72308 
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SEQ ID NO: - : -0,289344, 1418, a putative invasin, similar 
to putative membrane protein b 1 9 7 8 [Escherichia coli] 
gi | 7466779 I pir I | D64962 (32% identity in 1352 amino acids) 

2645 and similar to vasins, for example, [Yersinia pestis] 
gi | 726319 | gb | aaA96352.1 | (36% identity in 661 amino acids), 
and similar to intimins, for example, [Escherichia coli strain 
4221] gi | 1947048 [ gb I aa SEQ ID NO: acid B52913.il [sic, 
gi | 1947048 | gb j aaB52913.1 j ] (30% identity in 874 amino acids) 

2650 SEQ ID NO: - : -0.170242, 290, a putative reductase, similar 
to reductases, for example, oxidoreductase, [Thermotoga 
maritimal gi | 7431104 | pir | | A72308 (46% identity in 281 amino 
acids) 

SEQ ID NO: 1479: 0.107317, 83, novel, similar to hypothetical 
265 5 protein YaiU [Escherichia coli] 

gi I 2495526 j sp I P75700 | YAIU#ECOLI (37% identity in 54 amino 
acids) [putative flagellin structural protein in HemB-sbmA 
i n t e r g e n i c r e g i o n ] 

SEQ ID NO: 1480 -0.156319, 365, a putative adhesin, similar to 
2660 high molecular weight adhesin, for example, HmwA 
[Haemophilus influenzae] 
gi | 5929966 | gb | aaD56660.1 | AF180944#1 (19% identity in 199 
amino acids) 

SEQ ID NO: 1481: "0.088933, 254, novel 

2665 SEQ ID NO: 1482: -0.235772, 124, novel, similar to a part of 
hypothetical protein [Escherichia coli] 

gi | 2506596 | sp j P21514 [ YAHA#ECOLI (48% identity in 38 
amino acids) ; and similar to regulatory elements, for 
example, ,BvgA [Bordetella hronehiseptic] 

2670 gi ! 115157 | sp | P16574 | BVGA#BORPE (44% identity in 49amino 
acids), GTG start 

SEQ ID NO: 1483: 0.530909, 56, novel 

SEQ ID NO: 1484: -0.632692, 53, a putative fimbriaereguiatory 
protein, similar to invertase (partial), Oterminal part of type 1 
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2675 fimbriae regulatory proteins, for example, FimE [Escherichia 
coli K-12] gi ! 1201.87 j sp | P04741 I FIME#ECOLI (73% identity in 
49 amino acids); and FimB [Escherichia coli] 
gi ! 729489 | sp | P04742 I FIMB#ECOLI (63% identity in 75 amino 
acids) 

2680 SEQ ID NO: .1485 : -0.385069, 147, a putative fimbriae 
regulatory protein, invertase, similar to a part of type 1 
fimbriae regulatory proteins, for example, FimB [Escherichia 
coli K-12] gi j 729489 | sp | P04742 | FIMB#ECOLI (49% identity in 
114 amino acids); and FimE [Escherichia coli] 

2685 gi | 120167 j sp | P04741 j FIME#ECOLI (42% identity in 113 
amino acids) ,TTG start , probably interrupted 
SEQ ID NO: 1486: 1.684091, 45, novel 
SEQ ID NO: - : 0.114286, 50, novel 

SEQ ID NO: 1500: -0.450414, 1328, a putative adhesin, similar 
2690 to AidA-I adhesin precursors ,for example , [Escherichia coli 
plasmid F] gi | 8918851 | dbj j Baa 9 789 8.1 | (45% identity in 
.1.179 amino acids); similar to IgAl protease homolog MisL 
[Salmonella typhimurium pathogenicity island SPI-3] 
gi | 4324810 | gb | aaD16954.1 j (39% identity in 768 amino acids); 
2695 and similar to VirG [Shigella flexneri] gi j 96922 j pir | | A32247 
( 3 1. % i d e n t i t y i n 1 0 1 4 a m i n o a c ids) 

SEQ ID NO: 1502: -0.081707, 329, a putative sugar-binding 
protein, similar to sugar-binding proteins, for example, bl516 
[Escherichia coli! gi | 7466925 | pir | | G64905 (27% identity in 

2700 309 amino acids) 

SEQ ID NO: 1503: -0.030233, 87, a putative ABC transporter 
ATP-binding protein, similar to N-terminal part of ABC 
transporter ATP-binding protein, for example, [Streptomyces 
coelicolor A3(2)] gi | 7479 110 ! pir j | T34924 (48% identity in 82 

2705 amino acid) [also to AraG-E.coli] 

SEQ ID NO: 1504: 0.144865, 371, a putative ABC transporter 
ATP-binding protein, similar to C-terminal part of sugar ABC 
transporter ATP-binding proteins, for example, [Bacillus 
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subtilis] gi I 7404442 | sp | P36947 | RBSA#BACSU (36% identity 

2710 in 380 amino acids) 

SEQ ID NO: 1505: 0.929412, 324, a putative ABC transporter 
(permease) . similar to ABC transport system permeases, for 
example, RbsC [Bacillus subtilis] gi | 7446897 | pir | | B69690 
(34% identity in 299 amino acids), and [Escherichia coli] 

2715 gi | 400960 | sp | P04984 | RBSC#ECOLI (31% identity in 298 
amino acids) 

SEQ ID NO: - : 1.081132, 319, a putative ABCtransport 

system permease, similar to ABC transport system permeases, 
for example, RbsC [Escherichia coli] gi j 78833 | pir | j C26304 

2720 (35% identity in 291 amino acids), and [Bacillus subtilis] 
gi | 7446897 | pir | I B69690 (34% identity in 290 amino acids) 
SEQ ID NO: : -0.118928, 318, a putative transcription 

regulatory element, similar to araC-family transcription 
regulatory elements, for example, AdpA [Streptomyces 

2725 coelicolor A3(2)l gi | 7544056 | emb | CAB87229.1 (39% identity in 
311 amino acids) 

SEQ ID NO: 1606: -0.14084, 263, similar to YDDR#BACSU 
gi | 7474951 I pir i I H 69776 (47% identity in 259 amino acids) 
SEQ ID NO: 1360: -0.236079, 353, probably an ABC transporter 
2730 ATP-binding protein (probably ferric transport system), similar 
to ABC transporter ATP-binding proteins, for example, AfuC 
[Escherichia coli K-12] gi | 2506109 | sp | P37009 | AFUC#ECOLI 
(94% identity in 352 amino acids) 

SEQ ID NO: 1361 : 0.860259, 693, a putative ferrictransport 
2735 systempermease, similar to ferrictransport systempermeases, 
for example, AfuB [Actinobacillus pleuropneumoniae] 
gi 1 738752 7 | sp j Q44123 j AFUB#ACTPL (66% identity in 671 
amino acids) 

SEQ ID NO: 1362 : -0.371429, 344, a putative 
2740 periplasmic-iron-binding protein, similar to 

periplasmic-iron-binding proteins, for example, AfuA 
[Actinobacillus pleuropneumoniae] gi | 1469286 | gb | aa B0 5032. 1 j 
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(72% identity in 343 amino acids) 

SEQ ID NO: 1363 : 0.585714, 435, a putative regulatory 
2745 element, similar to hexosephosphate transport 

systemregulatory proteins, for example, UhpC [Escherichia coli 
K-12] gi | 136770 | sp | P09836 | UHPC#ECOLI (53% identity in 
415 amino acids) 

SEQ ID NO: 1364: 0.329436, 514, a putative sensor histidine 
2750 protein kinase, similar to sensor protein kinases, for example, 
hexosephosphatetransport systemsensor protein UhpB 
[Escherichia coli K-12] gi | 7429062 | pir | | RGECUB (35% 
identity in 49 7 amino acids) 

SEQ ID NO: 1365 : 0.151196, 210, a putative transcription 
2755 regulatory element (probably a response regulatory element), 
similar to transcription regulatory elements, for example, 
hexose phosphate transport system regulatory protein 
UhpA I Salmonella typhimurium j 

gi | 136767 | sp | P2 7667 j UHPA# SALTY (49% identity in 202 
2760 amino acids); and UhpA [Escherichia coli] 
gi | 136766 I sp | P10940 | UHPA#ECOLI (48% identity in 202 
amino acid) 

SEQ ID NO: - : 0.595302, 150, novel 
SEQ ID NO: 1625: -0.624948, 482, novel 
2765 SEQ ID NO: 1697: -0.57125, 81, novel, similar to a part of 
hypothetical protein [Yersinia enterocolitica] 

gi | 3511032 | gb | aaC33681.1 (at the position 1-70 of 80 amino 
acids) (45% identity in 70 amino acids) 

SEQ ID NO: 1698: -0.341936, 94, novel, similar to hypothetical 
2770 protein (99 amino acids) [Yersinia pestisj 

gi | 3822096 \ gb | aaC69816.1 (35% identity in 89 amino acids) 
SEQ ID NO: 1602: -0.638432, 524, novel 

SEQ ID NO: 1056: -0.363636, 452, a putative transporter (an 
outer membrane protein), similar to outer membrane 
2775 transporter proteins, for example, CyaE protein [Bordetella 
pertussis] gi j 1 1 7799 j sp j P11092 j CYAE#BGRPE (25% identity 



Appendix B: Hideo et at. Full Translation 



in 385 amino acids) 

SEQ ID NO: 1057 : 0.097741, 1462, novel, similar to 
hypothetical proteins, for example, [Synechocystis sp. strain 
2780 PCC 6803] gi | 7469433 | pir | | S76109 (33% identity in 1384 
amino acids) ; similar to RTX protein [Aeromonas salmonicidaj 
gi I 6752871 | gb | aaF27914.1 | AF218037#1 (33% identity in 1384 
amino acids) 

SEQ ID NO: 1058 : - , 5292, novel, similar to 

2785 hypothetical proteins, for example, [Synechocystis sp. strain 
PCC 6803] gi | 7469433 | pir | IS76109 (36% identity in 2014 
amino acids), and similar to RTX protein [Aeromonas 
salmonicidaj gi | 67 5287 1 j gb I aaF2 79 1 4. 1 i AF2 1 8037#1 (36% 
identity in 2051 amino acids); hemagglutinin [Streptococcus 
2790 gordonii] gi | 8885520 | dbj | Baa97453.1 | (35% identity in 2056 
amino acids), GTG start 

SEQ ID NO: 1059 : 0.082011, 707, a putative transporter, 
similar to transporters (ATP-binding proteins), for example, 
L k t B [ A c t i n o b a c i 1 1 u s 

2795 actinomycetemcomitansjgi j 126357 ! sp | P23702 ! HLYB#AC TAG 
(26% identity in 690 amino acids) 

SEQ ID NO: - : -0.275448, 392, a putative transporter, 

similar to membrane fusion proteins, for example, 
[Sinorhizobium meliloti] gi j 4689001 I emb | CAB41456.1 ! (28% 
2800 identity in 372 amino acids) 

SEQ ID NO: 1559: -0.082857, 141, novel 
SEQ ID NO: 1560: 0.236364, 56, novel 

SEQ ID NO: 1561 : -0.525147, 339, a putative adhesin/invasin, 
similar to surface protein [Xylella fastidiosaj 
2805 gi j 9106565 | gb | aaF84338.1 | AE003982#1 1 (22% identity in 313 
amino acids); and putative adhesin/invasin [Neisseria 
meningitidis MC58] gi | 7227256 | gb | aaF42321.1 | (23% identity 
in 33 7 amino acid) 

SEQ ID NO: 1562: -0.5825, 121, novel 
2810 SEQ ID NO: - : -0.746575, 74, novel, similar to a part of 
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hypothetical protein YahH [Escherichia coli] 

gi j 2495514 | sp | P75690 I YAHH#ECOLI (69% identity in 23 
amino acids) 

SEQ ID NO: 1303: -0.35, 379, an H repeat-associated protein, 
2815 similar to H repeat-associated protein in RhsB element 
[Escherichia coli] gi j 140772 | sp | P28912 | YHHI#ECOLI (97% 
identity in 378 amino acids) 

SEQ ID NO: 1304: -0.745946, 445, an Rhs protein, similar to 
putative Rhs proteintreptomyces coelicolor A3(2) 
2820 gi | 7321289 | emb | CAB82067.1 | (34% identity in 285 amino 
acids); and RhsE protein - E. coli gi | 2507113 | sp | P24211 | (36% 
identity in 139amino acids), GTG start 
SEQ ID NO: 1305: -0.224444, 136, novel 

SEQ ID NO: 1306: -0.577477, 1617, an Rhs protein, similar to 
2825 putative Rhs protein [Streptomyces coelicolor A3(2)j 
gi I 7321289 | emb | CAB82067.1 | (30% identity in 857amino 
acids): and RhsH protein [Escherichia coli strain ec45] 
gi | 2920634 i gb j aaC32471.1 | (25% identity in 919 amino acids) 
SEQ ID NO: 1307: -0.498693, 154, novel 
2830 SEQ ID NO: 1308: -0.509795, 634, a putative Vgr protein, 
similar to Vgr protein, for example, [Escherichia coli strain 
ecll] gi | 2920640 | gb | aaC32475.1 | (93% identity in 529 amino 
acid) 

SEQ ID NO: 1474: -0.281303, 354, similar to YBGO#ECOLI 
2835 gi! 1786935 (87% identity in 353 amino acids), but [having] 
differeint N- terminus 

SEQ ID NO: 1475: -0.419342, 244, similar to YBGP#ECOLI 
gi I 1786936 (78% identity in. 242 amino acids) [putative 
chaperone] 

2840 SEQ ID NO: 1476: -0.430567, 724, similar to N-terminal part 
of YBGQ#ECOLI gi 11786937 (amino acids at the position 
.1-723/818) (84% identity in 723 amino acids) [putative outer 
membrane protein] 

SEQ ID NO: 1477: -0.026943, 194, similar to YBGD#ECOLI 
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2845 gi 11786938 (79% identity in 188 amino acids) [putative 
f i m b r i a I - 1 i k e protein] 

SEQ ID NO: 12 75 : -0.0701, 302, a putative transcription 
regulatory element, similar to transcription regulatory 
elements, for example, glycine cleavage system transcription 
2850 activator (gcv operon activator) - Escherichia coli 
gi | 417043 | sp | P32064 | GCVA#ECOLI (31% identity in 300 
amino acids) 

SEQ ID NO: 1276 : -0.4, 201, a putative cob(I)alamin 
ade no syl transferase, similar to cob (I) a 3. ami n 

2855 adenosyltransferases (corrinoid adenosyltransferases) , for 
example, [Escherichia coli] 

gi | 115148 | sp | P13040 | BTUR#ECOLl (67% identity in 200 
amino acids) 

SEQ ID NO: 1277 : -0.259636, 551, a putative fumarate 
2860 hydratase, similar to fumarate hydratases, for example, 
fumarate hydratase class I, aerobic (fumarase) - Escherichia 
coli gi | 120598 | sp | P00923 | FUMA#ECOLI (68% identity in 545 
amino acids) 

SEQ ID NO: 1278: 0.92183, 427, a putative transporter protein, 
2865 similar to glutamate/aspartatetransporter proteins (proton 
glutaraate symport proteins), for example, [Bacillus 
stearothermophilus] gi | 121467 | sp I P24943 | GLTT#BACST (38% 
identity in 416 amino acids), and similar to 
C4-dicarboxylatetransporter proteins, for example, [Rhizobium 
2870 1 for example, uminosarum] 

gi j 231980 | sp | Q01857 | DCTA#RHILE (37% identity in 400 
amino acids) 

SEQ ID NO: 1279: -0.126667, 106, novel 

SEQ ID NO: 1280 : -0.052632, 457, novel, similar to an 
2875 unnamed protein product iCitrobacter amalonaticus! 
gi | 3184398 | dbj | Baa28710.1 j (93% identity in 284 amino acids) 
SEQ ID NO: 1281 : -0.051816, 414, a 3-methylaspartate 
ammonialyase (beta-methylaspartase), similar to 
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3 -me thy la spar tate animonia-lyases (beta-methylaspartases), for 
example, [Citrobacter amalonaticus] 

gi j 3184397 | dbj I Baa28709.1 | (93% identity in 413 amino 
acids); and [Clostridium tetanomorphum] 

gi | 729971 | sp | Q05514 | MaaL#CLOTT (55% identity in 409 
amino acids) 

SEQ ID NO: 1282 : "0.214345, 482, a probable glutamate 
mutase E (methylaspartate mutase E), similar to glutamate 
mutases, for example, [Citrobacter amalonaticus] 
gi | 3184396 | dbj i Baa28708.1 | (90% identity in 481 amino acids), 
and [Clostridium tetanomorphum] 

gi|729586|sp|Q05509|GLME#CLOTT (57% identity in 481 
amino acids) 

SEQ ID NO: 1283 : -0.058875, 463, a probable glutamate 
mutase L (methylaspartate mutase L), similar to glutamate 
mutase L (methylaspartate mutase L), for example, 
[Clostridium tetanomorphum] gi | 444421 | prf | | 1907157C (32% 
identity in 449 amino acids) 

SEQ ID NO: 1284: 0.061074, 150, a probable glutamate mutase 
S (methylaspartate mutase S), similar to glutamate mutase S 
(methylaspartate mutase S), for example, I Clostridium 
Cochlearium] gi | 7245512 i pdb | 1CCW | A (57% identity in 156 
amino acids) 



SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO: 928 



1285 
1286 
1287 



-0.278182, 56, novel 
-0.114286, 141, novel 
-0.327388, 315, novel 

-0.906945, 73, an excisionase, identical to 
excisionase [Bacteriophage!! KG 2 2] 

gi | 1722835 ! sp j PI 1683 j VXIS#BP434; and similar to 
excisionase [Bacteriophagelambda] 
gi j 139680 | sp | P03699 | VXIS#LAMBD (98% identity in 72 amino 
acids) 

SEQ ID NO: 929: -0.565455, 56, novel, similar to hypothetical 
protein ORF55 [Bacteriophage 434] gi | 801889 j gb | aaA67903.1 I 
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(98% identity in 55amino acids) 

SEQ ID NO: 930: -0,0725, 41, novel, similar to hypothetical 
2915 protein ORF-91 [phage 434] gi. | 93720 j pir j | A27354 (82% 
identity in 28 amino acids) 

SEQ ID NO: 931 : 0.247159, 177, novel [putative membrane 
protein! IMP] 

SEQ ID NO: 932: -0.605479, 74, novel, similar to C4-type zinc 
2920 finger proteins (TraR family), for example, 

gi I 7649830 I dbj | Baa94 108.1 I (98% identity in 73 amino acids) 
SEQ ID NO: 933: -0.346237, 94, novel, similar to hypothetical 
proteins, for example, [Bacteriophage 933W] 

gi | 5881602 | dbj | Baa84293.1 | (97% identity in 93 amino acids); 
2925 and orf61 [Bacteriophage lambda] (95% identity in 46 amino 
acids) 

SEQ ID NO: 934: -0.079365, 64, novel, similar to hypothetical 
proteins, for example, [Bacteriophage VT2-Sa] 

gi I 5881603 | dbj | Baa84294.1 | (96% identity in 61 amino acids), 
2930 and orf63 [Bacteriophage lambda] gi | 508994 | gb | aaA96567. 1 j 
(92% identity in 63 amino acids) 

SEQ ID NO: 935: -0.246667, 61, novel, similar to hypothetical 
protein, for example, [Bacteriophage 933W] 

gi I 4 5 8 5 3 8 9 | gb | a a D 2 5 4 17.1 j AF 1 2 5 5 2 ()# 1 2 (95% identity in 60 

293 5 amino acids) and orf60a [Bacteriophage lambda] 
gi | 508995 | gb j aaA96568.1 | (93% identity in 60 amino acids) 
SEQ ID NO: 936: -0.359735, 227, an exonuclease, similar to 
exonucleases, for example, [Bacteriophage lambda] 
gi | 119702 | sp | P03697 | EXO#LAMBD (98% identity in 226 amino 

2940 acids) 

SEQ ID NO: 937: -1.293333, 61, novel, similar to NinE proteins, 
for example, [Bacteriophage 21] gi | 4539480 | emb | CAB39989.1 | 
(95% identity in 60 amino acids) 

SEQ ID NO: 938: -0.675, 57, novel, similar to NinF proteins, 
2945 for example, [Bacteriophage 21] gi | 4539481 | emb | CAB39990.1 I 
(92% identity in 56 amino acids), GTG start 



Appendix B: Hideo et at. Full Translation 

SEQ ID NO: 939 : -1.100483, 208, novel, similar to NinG 
proteins, for example, [Bacteriophage 21] 

gi j 4539482 | emb | CAB39991.1 | (95% identity in 204 amino 
295 0 acids) 

SEQ ID NO: 940 : -0.243891, 222, a serine/threonin 
proteinphosphatase, similar to serine/threonin 

proteinphosphatase, for example, [Bacteriophage lambda] 
gi | 130792 | sp | P03772 | PP#LAMBD (95% identity in 221 amino 
295 5 acids) 

SEQ ID NO: 941 : -0.257367, 320, novel, [a putative outer 
membrane protein; OMP], similar to putative outer membrane 
protein [Helicobacter pylori (strain J 9 9 ) I 

gi | 7465285 j pir i | H 71907 (19% identity in 297 amino acids) 
2960 (at low level) 

SEQ ID NO: 942: -0.396506, 230, antitermination, similar to 
antiterminators, for example, protein Q [Bacteriophage 82] 
gi | 132277 | sp | P13870 | RegQ#BP82 

SEQ ID NO: 943: 0.576577, 223, novel, [hypothetical membrane 
2965 protein; IMP], similar to orfl4 [Actinobacillus 
actinomycetemcomitans] gi | 7592819 | dbj | Baa94406.1 j (29% 
identity in 228 amino acids); and TfpB protein [Moraxella 
bovis] gi | 141258 | sp | P20666 | TFPB#MORBO (26% identity in 
190 amino acids) 
2970 SEQ ID NO: 944: -0.288636, 133, novel 

SEQ ID NO: 945: 0.109859, 72, an hoi in protein, holin proteins, 
for example, [Bacteriophage 933W] 

gi | 4499808 j emb j CAB39307.1 | (92% identity in 71 amino acids) 
SEQ ID NO: 946 : -0.186061, 166, an endolysin (lysozyme), 
2975 similar to endolysins (lysozyme), for example, R protein 
[Bacteriophage 21] gi | 67436 I pir | [ LZBP2 1 (93% identity in 165 
amino acids) 

SEQ ID NO: 947: -0.409678, 156, novel, GTG start 
SEQ ID NO: 948: -0.060294, 69, a rib o some protein L31-like 
2980 protein, similar to hypothetical proteins, for example, ribosome 
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protein L31 homolog ykgM in intF-eaeH intergenicregion 
[Escherichia coli K-12] gi | 3025204 | sp | P71302 1 YKGM#ECOLI 
(93% identity in 86araino acids), GTG start 
SEQ ID NO: 949: 0.736, 51, novel, GTG start 
2985 SEQ ID NO: 950 : 0.613043, 93, putative colicin immunity 
protein, similar to colicinimmunity proteins, for example, 
colicin El immunity protein 

gi | 124395 | sp | P02985 j IMMl#ECOLI (25% identity in 107 
amino acid) 

2990 SEQ ID NO: 951: -0.444172, 164, novel, [a putative membrane 
protein; IMP], similar to hypothetical protein MAL4P2.26 
[Plasmodium falciparum! gi | 6562728 | emb | CAB62867.1 | (29% 
identity in 106 amino acids) (at low level) 
SEQ ID NO: 952: -0.572571, 701, novel 

2995 SEQ ID NO: 953: -0.84, 71, novel 

SEQ ID NO: 954: -0.437433, 375, novel, similar to C-terminal 
part of hypothetical protein, for example, [Pseudomonas putida] 
gi | 2995633 | gb | aaC98738.1 | (40% identity in 200 amino acids); 
and L0015 [Escherichia coli] gi | 3414883 | gb | aaC31494.1 | 

3000 (39% identity in 200 amino acids), GTG start 

SEQ ID NO: 955: -1.301176, 86, novel, similar to hypothetical 
protein, for example, orf29 [Escherichia coli] 
gi | 6009405 | dbj | Baa84864.1 | (37% identity in 136 amino 
acids); and L0013 [Escherichia coli] 

3005 gi | 3414881 j gb j aaC31492.1 j (38% identity in 124 amino acids) 
SEQ ID NO: 956: -0.21966, 708, novel, similar to hypothetical 
proteins, for example, orf50 [Escherichia coli] 
gi | 6009426 j dbj j Baa84885.1 j (71% identity in 106 amino 
acids); and L0014 [Escherichia coli] 

3010 gi j 3288157 | emb | Caall510.1 | (64% identity in 116 amino 
acids) 

SEQ ID NO: 957: 0.07541, 123, novel, similar to hypothetical 
proteins, for example, LOO 15 [Escherichia coli] 
gi | 3414883 j gb j aaC31494.1 j (61% identity in 503 amino acids) 



Appendix B: Hideo et at. Full Translation 



3015 SEQ ID NO: 958: -0.213187, 92, novel, similar to hypothetical 
proteins, for example, 57,8 kD protein [Pseudomonas 
putidajgi I 2496740 | sp | P55630 j Y4QI#RHISN (37% identity in 
232 amino acids) 

SEQ ID NO: 959: -0.348958, 193, novel, similar to hypothetical 
3020 protein, for example, 20. 3K protein [Agrobacterium tumefaciens 
181131] gi| 95090 Ipir | IJC1151 (41% identity in 101 amino 
acids) 

SEQ ID NO: 960: -0.065414, 134, novel 

SEQ ID NO: 961 : -0.125911, 248, immunity to R478 
3025 phage/colicin/tellurite resistance cluster, similar to TerW 
[plasmid R478] gi | 1354147 j gb | aaC 4473 6.1 j (99% identity in 
155 amino acids) 

SEQ ID NO: 962: -0.134375, 129, novel 

SEQ ID NO: 963: -0.372477, 110, novel, similar to hypothetical 
3030 proteins, for example, [Deinococcus radiodurans] 
gi | 7472167 i pir i | B75302 (42% identity in 305 amino acids) 
SEQ ID NO: 964 : -0.581686, 1022, novel, similar to 
hypothetical proteins, for example, [Streptomyces coelicolor 
A3(2)j gi | 7472048 | pir M A75302 (34% identity in 260 amino 
303 5 acids) 

SEQ ID NO: 965: -0.305505, 110, novel, similar to hypothetical 
proteins, for example, [Streptomyces coelicolor A3(2)] 
gi | 8246803 ! emb j CAB92838.1 | (45% identity in 97 amino acid) 
SEQ ID NO: 966: -0,476724, 233, novel, similar to hypothetical 

3040 proteins, for example, [Serratia marcescens] 

gi | 1695868 j gb j aaB37122.1 j (100% identity in 167 amino acids) 
SEQ ID NO: 967: -0.431156, 200, novel, hypothetical proteins, 
for example, [Serratia marcescens] 

gi j 1695869 j gb j aaB37123.1 j (99% identity in 197 amino acids); 

3045 and [Deinococcus radiodurans (strain Rl)] 

gi | 7471591 I pir j | F75301 (38% identity in 364 amino acids) 
SEQ ID NO: 968: 0.120465, 216, novel, similar to hypothetical 
proteins, for example, [Serratia marcescens] 
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gi j 1695870 j gb | aaB37124.1 j (99% identity in 173amino acid); 

3050 [Serratia marcescens] gi | 1695871 | gb | aaB37125.1 | (98% 
identity in 53 amino acids); and [Deinococcus radiodurans] 
gi | 7471522 I pir I | E75301 (28% identity in 286 amino acids) 
SEQ ID NO: 969 : -0.357696, 1138, possible tellurium 
resistance, similar to TerZ protein, for example, [Serratia 

3055 marcescens] gi | 6094454 | sp | Q52353 | (98% identity in 193 
amino acids) 

SEQ ID NO: 970: -0,31005, 200, a tellurium resistance, similar 
to TerA protein, for example, [Serratia marcescens] 
gi | 5702379 | gb | aaD47285.1 | AF168355#3 (67% identity in 385 

3060 amino acids) 

SEQ ID NO: 971: -0.739041, 439, tellurite resistance, similar 
to TerB protein, for example, [Serratia marcescens] 
gi | 950680 ! gb | aaA86848.1 | (100% identity in 151 amino acids) 
SEQ ID NO: 972: -0.284314, 103, tellurium resistance, similar 

3065 to TerC protein, for example, [Serratia marcescens] 
gi | 6226214 | sp | Q52356 | TERC#SERMA (100% identity in 346 
amino acids) 

SEQ ID NO: 973: -0.460736, 327, tellurium resistance, similar 
to terD protein, for example, [Serratia marcescens] 
3070 gi | 6094448 | sp | Q52357 | TERD#SERMA (100% identity in 192 
amino acids) 

SEQ ID NO: 974: -0.541515, 331, possible tellurium resistance, 
identical to gi j 7108482 | gb | aaF36434.1 | AF126104#3 

TLRB#ECOLI (100% identity in 191amino acids); and similar to 
3075 TerE protein, for example, [Serratia marcescens] 
gi | 6094449 | sp | Q52358 | TERE#SERMA (98% identity in 191 
amino acids) 

SEQ ID NO: 975: -0.394881, 294, novel 

SEQ ID NO: 976: 0.154545, 45, tellurium resistance, identical 
3080 to gi j 7108481 | gb | aaF36433.1 | AF126104#2 TRLA#ECOLI 
(100% identity in 102 amino acids); and similar to TerF protein, 
for example, [Serratia marcescens] 
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gi j 7387491 | gb ! aaA86852.2 | TERF#SERMA (94% identity in 
102 amino acid)SEQ ID NO: 977: -0.360345, 233, novel, GTG 
3085 start 

SEQ ID NO: 1550 : -0.338059, 671, an adhesin, similar to 
Ihaadhesin [Escherichia coli 0-157:H7 strain 86-24] 
gi I 7108480 ! gb | aaF36432.1 | AF126104#1 IHA#ECOLI (99% 
identity in 696 amino acids); and exogenous ferric siderophore 
3090 receptor R4 [Escherichia coli strain CFT073] 
gi | 3661500 | gb | aaC61730.1 j gi I 3661500 j gb | aaC61730.1 I (99% 
identity in 669 amino acids) 

SEQ ID NO: 1665: 0.638415, 165, novel, similar to a part of 
hypothetical protein [Shigella flexnerij 

3095 gi | 5880472 | gb | aaD54665.1 | AF097520#3 (44 % identity in 40 
amino acids) 

SEQ ID NO: 1517: 0.82528, 448, novel, similar to C-terminal 
part of Shi A [Shigella flexneri] 

gi | 5532447 | gb | aal)44731 .1 | AF141323#2 (49% identity in 73 

3 1 00 amino acids); TTG start 

SEQ ID NO: 1518: 0.075472, 107, novel 

SEQ ID NO: 1519: -0.587221, 494, novel 

SEQ ID NO: 1567: -0.283051, 414, novel, TTG start 

SEQ ID NO: 1568: 0.021192, 152, novel, GTG start 

3105 SEQ ID NO: - : 0.033871, 63, novel, TTG start 
SEQ ID NO: 411: -0.575221, 340, novel 
SEQ ID NO: 412: 0.4 96, 51, novel 

SEQ ID NO: 413 : -0.713974, 824, a possible 

glucosyl-transferase, similar to glucosyl-transferases, for 
3110 example, [Salmonella typhi] gi | 7467230 | pir | | T30292 (72% 
identity in 366 amino acids) 

SEQ ID NO: 414: 0.095238, 64, a putative ferric enterochelin 
esterase (partial), similar to C-terminal part of ferric 
enterochelin esterases, for example, [Salmonella enterica] gi j 
31 15 2738250 | gb j aaC46181.1 | (66% identity in 68amino acids), TTG 
start 
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SEQ ID NO: 415 : -0.280645, 63, a transposase, similar to 
tran sposases, for example, [Shigella boydii] 

gi | 2197010 ! gb ! aaB61273.1 I (100% identity in 167 amino acids) 
3120 SEQ ID NO: 416: -0.108911, 102, a possible repressor, similar 
to InsA protein, for example, [insertion sequence IS I F] 
gi | 124915 j sp | PI 9767 j ISA2#ECOLI (98% identity in 91 amino 
acids), GTG start 

SEQ ID NO: 417 : -0.490164, 62, novel [putative membrane 
3125 protein; IMP] SEQ ID NO: 418: -0.37, 51, novel 

SEQ ID NO: 419: -0.735659, 130, novel, GTG start 
SEQ ID NO: 420 : -0.62381, 43, novel, similar to sensor 
regulatory element protein HutT [Rhodobacter capsulatus] 
gi | 1075537 | pir | | A49938 (33% identity in 97 amino acids) (at 
3130 low level) 

SEQ ID NO: 421: -0.882353, 52, novel 
SEQ ID NO: 422: -0.729167, 73, novel 

SEQ ID NO: 423: -0.036842, 96, transposase (OrfB), similar to 
transposases, for example, [insertion sequenceIS629] 

3135 gi | 7443863 i pir i I T00315 (98% identity in 295 amino acids) 

SEQ ID NO: 424: -0.433333, 64, transposase (OrfA), similar to 
hypothetical proteins, for example, [Escherichia coli plasmid 
p 0-157 insertion sequence IS629] gi j 7444868 j pir j | T00241 
(96% identity in 108amino acids) 

3140 SEQ ID NO: 425 : -0.6728, 126, an HecB-like protein, its 
N-terrainal-half part is similar to N-terminal part of 
hemolysinactivation protein HecB [Neisseria meningitidis 
MC58] gi I 7227016 | gb | aaF42103.1 | (34% identity in ISlamino 
acids) 

3145 SEQ ID NO: 426 : -0.534445, 91, novel 

SEQ ID NO: 427: -0.372341, 142, novel, similar to a part of 
tRNA-splicing endonuclease positive effector [fission yeast] 
gi ! 7493527 | pir | | T40065 (22% identity in 531 amino acids) (at 
low level); and similar to hypothetical protein, for example, 

3150 [Aquifexaeolicus] gi j 7514764 j pir j j D70476 (24% identity in 
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271 amino acids) (at low level) 

SEQ ID NO: 428: -0.229139, 152, novel, TTG start 
SEQ ID NO: 429: -0.721212, 364, novel, similar to hypothetical 
proteins, for example, YbdN [Escherichia coli] 
3155 gi | 3024984 | sp | P77216 | YBDN#ECOLI (58% identity in 396 
amino acids) 

SEQ ID NO: 430 : -0.4, 249, novel, similar to hypothetical 
protein YbdM [Escherichia coli] 

gi | 3024983 j sp j P77174 | YBDM#ECOLI (58% identity in 212 
3160 amino acids) 

SEQ ID NO: 431 : -0.385547, 257, a transcription regulatory 
element, similar to PerC CBfpW) [Escherichia coli! 
gi | 1172431 | sp | P43475 | PERC#ECOLI (25% identity in 83 
amino acids) 

3165 SEQ ID NO: 432 : -0.49854, 138, novel, similar to 
exopolyphosphatase [Pseudomonas aeruginosa] 

gi | 4200042 | dbj | Baa74460.1 | (32% identity in 56 amino acids) 
(at low level) 

SEQ ID NO: 433: -0.133074, 258, novel 

3170 SEQ ID NO: 434: 1.383019, 54, novel, its N-terminal part is 
similar to BfpM [Escherichia colilgi | 847983 | gb | aaC44052.1 | 
BFPM#ECOLI (52% identity in 113 amino acids) ; its 
N-terminal part is similar to putative transposase [Vibrio 
choleraej gi | 7467523 I pir j I T09435 (55% identity in 68 amino 

3175 acids) ; and its C-terminal part is similar to a part of 
hypothetical protein [Escherichia coli 0-157:H7] 
gi | 7649865 I dbj | Baa94 143.1 | (98% identity in 82 amino acids) 
SEQ ID NO: 435 : 0.16, 46, novel, similar to hypothetical 
protein [Pseudomonassyringae] gi j 1196744 j gb | aaA88435.1 ! 

3180 (34% identity in 50 amino acids) (at low level) 

SEQ ID NO: 436: 0.065714, 71, novel, similar to hypothetical 
protein, for example, orf"2 9 [Escherichia coli] 
gi j 6009405 | dbj | Baa84864.1 | (40% identity in 131 amino 
acids); and L0013 [Escherichia coli] 
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3185 gi | 3414881 | gb | aaC31492.1 j (38% identity in 130 amino acids) 
SEQ ID NO: 437: -0.96087, 93, novel 

SEQ ID NO: 438: -0.462461, 326, novel, similar to hypothetical 
protein, for example, yfjP protein [Escherichia coli] 
gi | 7449539 j pir j | B65Q42 (49% identity in 289 amino acids); 
3190 and yeeP protein ["Escherichia coli] 

gi | 2495624 j sp j P76359 | YEEP#ECOLI (95% identity in 183 
amino acids) 

SEQ ID NO: 439: -0.405691, 124, a putative adhesin, similar to 
outer membrane fluffing protein [Escherichia coli] 

3195 gi | 7466262 i pir i | G64964 (68% identity in 927 amino acids); 
and similar to glyco protein (Escherichia coli strain H10407] 
gi I 5305639 | gb | aaD41751.1 | (34% identity in 608 amino acids) 
(at low level); and similar to Adhesin AIDA-I precursor 
[Escherichia coli plasmid pIB6] 

3200 gi | 543788 | sp | Q03155 | AIDA#ECOLI (23% identity in 678 
amino acids) 

SEQ ID NO: 44 0: -0.14065, 124, novel, similar to hypothetical 
protein YjDA [Escherichia coli] 

gi I 731985 jsp|P16694|YJDA#ECOLI (32% identity in 793 

3205 amino acids) 

SEQ ID NO: 441 : 0.970589, 273, novel, similar to hypothetical 
protein YjcZ [Escherichia coli] 

gi | 731984 | sp | P39267 | YJCZ#ECOLI (30% identity in 278 amino 
acids), GTG start 

3210 SEQ ID NO: 442: 0.125316, 80, novel 
SEQ ID NO: 443: 0.024615, 196, novel 

SEQ ID NO: 444: -0.242045, 617, novel, similar to hypothetical 
proteins, for example, YfjQ [Escherichia coli] 
gi j 1723629 | sp I P52132 | YFJQ#ECOLI (73% identity in 271 
3215 amino acids); and YafZ [Escherichia coli] 

gi j 2495487 | sp | P77206 j YAFZ#ECOLI (73% identity in 271 
amino acids) 

SEQ ID NO: 445: -0.965741, 109, novel, similar to hypothetical 
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proteins, for example, YafK [Escherichia coli] 
3220 gi j 2495486 | sp | P75676 | YAFX#ECOLI (71% identity in 
144amino acids); and YfjX [Escherichia coli] 
gi j 1723636 | sp 1 P52139 j YFJX#ECOLI (75% identity in 137 
amino acids) 

SEQ ID NO: 446 : -0.635945, 218, a putative DNA repair 
3225 protein (RadC family), similar to putative RadC family proteins, 
for example, YkfG [Escherichia coli] 

gi j 3025218 | sp j Q47685 I YKFG#ECOLI (81% identity in 158 
amino acids)'- and YeeS [Escherichia 

colijgi | 3025155 | sp I P76362 | YEES#ECOLI (98% identity in 148 
3230 amino acids) 

SEQ ID NO: 447: -0.957693, 105, novel, similar to hypothetical 
protein YeeT [Escherichia coli] 

gi | 3025156 ! sp j P76363 | YEET#ECOLI (97% identity in 73 
amino acids) 

3235 SEQ ID NO: 448: 0.214754, 62, novel, similar to hypothetical 
proteins, for example, YeeU [Escherichia coli] 
gi | 3025157 | sp | P76364 | YEEU#ECOLI (89% identity in 
llSamino acids): and YfjZ [Escherichia coli] 
gi | 1723638 | sp j P52141 | YFJZ#ECOLI (66% identity in 98 amino 

3240 acids), GTG start 

SEQ ID NO: 449: -0.298065, 156, novel, similar to hypothetical 
proteins, for example, L0007 [Escherichia coli] 
gi | 3414875 | gb j aaC31486.1 j (93% identity in 124 amino acids); 
YeeV [Escherichia colij gi | 3025158 | sp | P76365 I YEEV#ECOLI 

3245 (87% identity in 124 amino acid s): and Ykf'I I Escherichia colli 
gi ! 3025213 ! sp 1 P77692 | YKFI#ECOLI (58% identity in 112 
amino acids) 

SEQ ID NO: 450: 0.945946, 38, novel, similar to hypothetical 
proteins, for example, L0008 [Escherichia coli] 
3250 gi ! 3414876 | gb | aaC31487.1 j (94% identity in 163 amino acids); 
and YeeW [Escherichia coli] 

gi | 3025160 I sp | P76366 j YEEW#ECOLI (65% identity in 55 
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amino acids) 

SEQ ID NO: 451: -0.110909, 56, novel, similar to hypothetical 
3255 proteins, for example, L0009 [Escherichia coli] 
gi | 3414877 I gb I aaC31488.1 I (87% identity in 65 amino acids) 
SEQ ID NO: 452: -0.405085, 178, novel, similar to hypothetical 
proteins, for example, L0010 [Escherichia coli] 
gi | 3414878 ! gb | aaC31489.1 | (81% identity in 111 amino acids); 
3260 ydiA [plasmid ColIb-P9] gi j 4512489 | dbj | Baa75138. 1 j (37% 
identity in 265 amino acids); and L0012 [Escherichia coli] 
gi | 3414880 | gb | aaC31491.1 j (80% identity in 61 amino acids) 
SEQ ID NO: 453: -0.335897, 79, novel 

SEQ ID NO: 454: 0.984375, 65, a putative integrase, similar to 
3265 integrases, for example, [Escherichia coli prophage el 4] 
gi | 3024035 | sp | P75969 | INTE#ECOLI (46% identity in 372 
amino acids) 

SEQ ID NO: 455: 0.088596, 115, a putative excisionase, similar 
to excisionase [bacteripohage P21] 

3270 gi| 139674 | sp | P27079 | VXIS#BPP21 (31% identity in 73 amino 
acids) 

SEQ ID NO: 456: 0.123529, 69, novel, GTG start 
SEQ ID NO: 457: -0.905494, 92, novel, TTG start 
SEQ ID NO: 458: -0.403175, 127, novel, similar to hypothetical 
3275 proteins, for example, YdfA [Escherichia coli] 
gi | 140584 | sp | P29008 | YDFA#ECOLI (91% identity in 49 amino 
acids) 

SEQ ID NO: 459: 0.010435, 116, a putative phage repressor, 
similar to repressor [Escherichia col Rac prophage! 
3280 gi | 3025101 | sp | P76062 | RACR#ECOLI (91% identity in 158 
amino acids) 

SEQ ID NO: 460 : -0.445312, 513, novel, similar to YdaS 
[Escherichia coli] gi | 3025102 | sp | P76063 | YDAS#ECOLI (84% 
identity in 94 amino acids) 
3285 SEQ ID NO: 461 : -0.04875, 81, novel, similar to YdaT 
[Escherichia coli] gi | 3183265 j sp | P76165 j YDFX#ECOLI (31% 
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identity in 83 amino acids) 

SEQ ID NO: 462: -0,425233, 643, novel, similar to Oterminal 
part of replication termination protein DnaT (preprinting 
3290 protein I) [ Escherichiaeoli] gi | 1361001 | pir | | S56589 (50% 
identity in 85 amino acids) 

SEQ ID NO: 463: -0.448868, 531, a putative replication protein, 
similar to replication proteins, for example , proteinl4 
[Bacteriophage phi-80] gi j 137937 j sp | P14814 | VG14#BPPH8 
3295 (47% identity in 129 amino acids}, GTG start 

SEQ ID NO: 464 : 0.055688, 502, novel, similar to YdaW 
[Escherichia coli] gi | 3025105 | sp | P76066 I YDAW#ECOLI (56% 
identity in 143 amino acids) 

SEQ ID NO: 465: -0.024348, 116, novel, GTG start. 
3300 SEQ ID NO: 4 66 : -0.331818, 89, novel, similar to Gp57 
[Bacteriophage N15] gi i 7459176 | pir j IT13144 (69% identity in 
78 amino acids), GTG start 

SEQ ID NO: 467: -0.239801, 202, novel, similar to hypothetical 
protein, for example, [Bacteriophage VT2-Sa] 

3305 gi | 5881670 | dbj | Baa84361.1 | (91% identity in 92 amino 
acids), GTG start 

SEQ ID NO: 468: -0.297006, 168, novel 

SEQ ID NO: 469: -0.163566, 130, novel, similar to hypothetical 
proteins, for example. Ea22 [Bacteriophage lambda] 
3310 gi | 137663 | sp | P03756 i VE22#LAMBD (39% identity in 108 
amino acids), GTG start 
SEQ ID NO: 470: -0.442375, 860, novel 

SEQ ID NO: 471: -0.447707, 110, novel, its N-terminal part is 
similar to hypothetical proteins, for example, b2363 

3315 [Escherichia coli] gi | 7451977 | pir | | H65009 (51% identity in 95 
amino acids), and its C-terminal part similar to hypothetical 
proteins, for example, [Bacteriophage 933W] 

gi j 4585382 | gb | aaD2541().l | AF125520#5 (43% identity in 75 
amino acids) 

3320 SEQ ID NO: 472: -0.339655, 233, novel 



Appendix B: Hideo et at. Full Translation 



SEQ ID NO: 473 : -0.377251, 212, a prophage maintenance 
protein, similar to Hok/Geffamily, for example, MokW 
[Bacteriophage 933 Wi 

gi j 4585453 I gb | aaD25481.1 | AF125520#76 (90% identity in 70 
3325 amino acids) 

SEQ ID NO: 474 : 0.057985, 227, novel, similar to QD1 
[Bacteriophage Nlo] gi | 2564084 | gb | aaB8 1659.1 | (31% identity 
in 84 amino acids) 

SEQ ID NO: 475 : -0.939706, 69, novel, similar to bl560 
3330 [Escherichia coli] gi | 1742555 | dbj | Baal5259.1 | (82% identity 
in 348 amino acids): and hypothetical protein A [phage Pi] 
gi | 732234 | sp | Q06262 | YORA#BPPl (26% identity in 314 amino 
acids) (also to Orfl9 (phi83)), GTG start 

SEQ ID NO: 476: -0.161714, 176, a putative crossover junction 

3335 endodeoxyribonuclease, similar to Gp67 [Bacteriophage HK97] 
gi I 6901839 j gb j aaF31142.1 ! (59% identity in 110 amino acids); 
crossover junction endodeoxyribonuclease s Rus [Escherichia 
coli cryptic lambdoid prophage DLP12] (41% identity in 107 
amino acids); and gi | 2507117 | sp | P40116 | RUS#ECOLI in (59% 

3340 identity in 110 amino acids) 

SEQ ID NO: 477: -0.277615, 1158, a putative antitermination 
protein, similar to antitermination proteins, for example , 
proteinQ [Escherichia coli] gi j 1742554 | dbj ! Baal5258. 1 | (39% 
identityin 273 amino acids) 

3345 SEQ ID NO: 478: -0.279397, 200, novel, GTG start 
SEQ ID NO: 479: -0.858542, 440, novel, GTG start 
SEQ ID NO: 480: -0.259551, 90, novel, similar to hypothetical 
protein, for example, [Bacteriophage VT2-Sa] 

gi | 5881634 | dbj | Baa84325.1 | (73% identity in 644 amino acids) 

3350 SEQ ID NO: 481, ECsll25: 1209796- 1209978, -0.078333, 61, 
novel, similar to hypothetical protein [Bacteriophage 933W] 
gi | 4499806 | emb ! CAB39305.1 | (67% identity in 59 amino acids) 
SEQ ID NO: 482 -0.877248, 190, novel, similar to hypothetical 
proteins, for example, [Bacteriophage VT2-SaJ 
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3355 gi ! 5881635 j dbj | Baa84326.1 I (78% identity in 89 amino acids) 
SEQ ID NO: 483 : -0.436667, 61, a putative holm protein, 
similar to holin proteins, for example, S protein [Bacteriophage 
VT2-Sa] gi | 5881636 | dbj | Baa84327.1 | (94% identity in 71 
amino acids) 

3360 SEQ ID NO: 245 : -0.375688, 437, novel, similar to YdfR 
[Escherichia colij gi | 3183262 | sp I P76160 | YDFR#ECOLI (47% 
identity in 74 amino acids) 

SEQ ID NO: 246: -0.447872, 95, a putative endolysin, similar 
to endolysins, for example, R protein [Bacteriophage 93 3 W] 
3365 gi | 4585422 | gb | aaD25450.1 | AF125520#45 (97% identity in 177 
amino acid) 

SEQ ID NO: 247 : -0.294175, 104, a putative antirepressor 
protein, identical to putative antirepressor protein 
[Bacteriophage 933W] 
3370 gi | 4585423 | gb | aaD25451.1 | AF125520#46 I and similar to 
antirepressor protein Ant [BacteriophageP22] 

gi | 131843 | sp | P03037 | RANT#BPP22 (49% identity in 189 
amino acids) 

SEQ ID NO: 248: -0.781579, 115, an endopeptidase (host cell 
3375 lysis), similar to endopeptidase, for example, Rzl Bacteriophage 
VT2-Sa] gi | 5881639 | dbj | Baa84330.1 | (80% identity in 155 
amino acids) 

SEQ ID NO: 249: -0.371015, 208, a lipoprotein Rzlprecursor, 
similar to lipoprotein Rzl precur sores, for example, 
3380 [Bacteriophage 933W]gi | 540738 Ipir | | JN0750 (52% identity in 
59 amino acids); [phage lambda] 

gi | 4585425 | gb | aaD25453.1 | AF125520#48 (76% identity in 59 
amino acids) 

SEQ ID NO: 250: -0.407368, 96, novel 
3385 SEQ ID NO: 251 : 0.416667, 73, novel, similar to hypothetical 
protein [Bacteriophage VT2-Sa] gi I 588 1 640 | dbj | Baa8433 1 . 1 j 
(73% identity in 45 amino acids) 
SEQ ID NO: 252: -0.590526, 96, novel 
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SEQ ID NO: 253: -0,644516, 156, novel, similar to hypothetical 
3390 protein [Escherichia coli] gi | 1778472 | gb | aaB40755.1 | (84% 
identity in 53 amino acids) 

SEQ ID NO: 254: -0.557587, 258, a putative DNase, similar to 
putative DNAse [Bacteriophage phi-3l] 

gi | 1107475 | emb | Caa62587.1 ! 28% identity in 85 amino acids) 
3395 SEQ ID NO: 255: -0.615069, 74, a putative terminase small 
subunit, similar to terminasesmall subunit [Bacillus subtilis 
PBSX phage] gi | 1722886 | sp | P39785 | XTMA#BACSU (42% 
identity in 57 amino acids), GTG start 

SEQ ID NO: 256: -0.595775, 72, a putative large terminase 
3400 subunit, similar to hypothetical proteins, for example, phage 
D3 terminase- like protein [Haemophilus influenzae] 
gi | 6739656 | gb | aaF27357.1 | AF198256#11 (22% identity in 472 
amino acids); and similar to putative large terminase subunit 
[Bacteriophage A2j gi I 3947452 |emb | Caa07 103.1 | (25% 
3405 identity in 456 amino acids) 

SEQ ID NO: 257 : -0.24127, 64, a putative major head 
protein/prohead protease, its N- terminal- half part is similar to 
putative prohead proteases, for example, Gp4 
[BacteriophageHK97] gi j 1722 780 | sp | P49860 | VP4#BPHK7 

3410 (2 8% identity in 136 amino acids); and its C - terminal- half part 
is similar to major head protein, for example , [Bacteriophage 
L5] gi | 465114 | sp | Q05223 | VG17#BPML5 (23% identity in 280 
amino acids), GTG start 

SEQ ID NO: 258: -0.248333, 61, a putative portal protein, 
3415 similar to portal protein, for example, [Bacteriophage HK022J 
gi ! 6863114 | gb 1 aaF30355.1 | AF069308#3 (26% identity in 351 
amino acids) 

SEQ ID NO: 259: -0.338496, 227, novel, similar to a novel 
protein [Haemophilus influenzae] 

3420 gi j 6739659 | gb | aaF27360.1 | AF198266#14 (71% identity in 2 .1. 
amino acids), GTG start 

SEQ ID NO: 260: -0.500383, 262, a putative head-tail adapotor, 
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similar to putative head-tail adaptors, for example, 
[Bacteriophage HK971 gi I 6901597 | gb | aaF31 100.1 ! (45% 

3425 identity in 111 amino acids) 

SEQ ID NO: 261: -0.665942, 139, novel, similar to hypothetical 
phage protein, for example, GplO [Bacteriophage I i K 9 7 1 
gi | 6901598 ! gb [ aaF31101.1 I (75% identity in 148 amino acids) 
SEQ ID NO: 262 : 0.008989, 90, novel, similar to Gpll 

3430 [Bacteriophage HK97] gi | 6901599 I gb | aaF31 102.1 | (49% 
identity in 113 amino acids)s 

SEQ ID NO: 263: -0.544444, 55, a putative major tail subunit, 
similar to major tail subunit [Bacteriophage HK97] 
gi | 6901588 I gb | aaF31091.1 | AF069529#4 (66% identity in 234 

343 5 amino acids) 

SEQ ID NO: 264: -0.273771, 123, a putative tail assembly 
chaperone, similar to tail assembly chaperon, for example, pl4 
[Bacteriophage HK97] gi | 6901600 I gb | aaF31103.1 | (62% 
identity in 124 amino acids) 

3440 SEQ ID NO: 265: -0.027711, 84, a putative tail protein [phage 
tail protein], similar to Oterminal part of Gpl4 [Bacteriophage 
HK97I gi I 6901601 |gb|aaF3 1104.1 | (60% identity in 90 amino 
acids), probably produced by translational frameshift 
SEQ ID NO: 266: -0.755556, 91, a putative tail length tape 

3445 measure protein (interrupted), similar to N-terminal part of 
tail length tape measure proteins, for example. [Bacteriophage 
HK97] gi | 6901589 | gb | aaF31092.1 | AF069529#5 (81% identity 
in 137 amino acids) 

SEQ ID NO: 267: -0.881667, 61, a putative tail length tape 
3450 measure protein, similar to Oterminal part of tail length tape 
measure protein, for example, [Bacteriophage HK.97] 
gi ! 6901589 | gb j aaF31092.1 | AF069529#5 (48% identity in 939 
amino acids), probably disrupted by frameshift 

SEQ ID NO: 268: 0.743396, 54, a putative minor tail protein, 
345 5 similar to minor tail protein, for example, GpM [Bacteriophage 
lambda] gi | 138845 | sp | P03737 | VMTM#LAMBD (43% identity in 
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110 amino acids). GTG start 

SEQ ID NO: 269: -0,476879, 174, a putative minor tail protein, 
similar to minor tail protein, for example, GpL [Bacteriophage 
3460 lambda] gi | 138844 | sp | P03738 | VMTL#LAMBD (76% identity in 
232 amino acids) 

SEQ ID NO: 2 70: -0.315668, 218, a putative regulatory protein, 
similar to regulatory protein Mnt [Bacteriophage P22] 
gi | 133138 | sp | P03049 | RMNT#BPP22 (34% identityin 73 amino 
3465 acids) 

SEQ ID NO: 271 : -0.295775, 72, a putative antirepressor protein, 
its Oterminal part is similar to antirepressor proteins, for 
example, Ant [Bacteriophage P22] 

gi | 131843 | sp | P03037 i RANT#BPP22 (84% identity in 71 amino 

347 0 acids), and its N- terminal part is similar to hypothetical, phage 
proteins, for example, Gp30 [Bacteriophage N15] 
gi | 7521545 j pir j | T13116 (35% identity in 175 amino acids) 
SEQ ID NO: 272 : -0.322449, 99, a putative tail assembly 
protein, similar to tail assembly proteins, for example, GpK 

3475 [Bacteriophage lambda] gi j 139638 I sp | P03729 I V T A K # L A M B D 
(86% identity in 196 amino acids) 

SEQ ID NO: 273 : -1,166667, 49, a putative tail assembly 
protein, similar to tail assembly protein, for example, Gpl 
[Bacteriophage lambda] gi j 139637 | sp | P03730 \ VTAI#LAMBD 

3480 (64% identity in 64 amino acids) 

SEQ ID NO: 274: -0.734113, 300, a putative secreted effector 
protein, similar to secreted effector proteinopA [Salmonella 
dublin] gi | 5669806 | gb | aaD46479. 1 | AF12122 7#1 (31% identity 
in 587 amino acids) 

3485 SEQ ID NO: 275: -0.469565, 484, novel, its Oterminal part is 
similar to cytotoxic necrotizing factor type 2 [Escherichia coli] 
gi I 1073353 ! pir j | A55260 (31% identity in 244 amino acids) (its 
N-terminus is similar to a novel protein [P. falciparum] (at low 
level)) 

3490 SEQ ID NO: 276: -0.447191, 90, novel 
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SEQ ID NO: 277: -0,883696, 93, novel [hypothetical membrane 
protein; IMP], similar to hypothetical protein, for example, 
b0362 [Escherichia coli] gi | 7466098 | pir | | B64764(50% identity 
in 79 amino acids), [partially similar to hemin receptor 

3495 precursor] 

SEQ ID NO: 278: -0.825352, 72, a transposase (OrfB) protein 
(insertionsequence IS2), similar to hypothetical protein, for 
example, [insertion sequence IS2] 

gi | 140808 | sp | P19777 | YI22#ECOLI (98% identity in 301 amino 

3500 acids), GTG start 

SEQ ID NO: 279: -, 79, novel, [putative transposase (OrfA) ], 
similar to hypothetical protein [insertion sequence I S 2 1 
gi | 140806 | sp | P19776 | YI21#ECOLI (100% identity in 53 amino 
acids) 

3505 SEQ ID NO: 280 -0.735135, 149, novel, similar to hypothetical 
protein [Salmonella typhimurium LT2] 

gi | 6960367 ! gb ! aaF33527.1 | (72% identity in 37 amino acids) 
SEQ ID NO: 281 : -0.217714, 176, novel 

SEQ ID NO: 282: -1.381667, 61, novel, similar to Yop effector 
3510 YopM [Yersinia enterocolitica] gi | 4324334 | gb | aaD 1 681 1 . 1 ! 
(25% identity in 171 amino acids), (also weakly to IpaH) 
SEQ ID NO: 283: -0.215789, 58, novel, TTG start 
SEQ ID NO: 284: -0.530738, 245, a putative integrase, similar 
to integrase, for example, [Shigella dysenteriae] 
3515 gi | 6759954 | gb | aaF28112.1 | AF153317#4 (31% identity in 389 
amino acids) 

SEQ ID NO: 285 : -0.205833, 241, a putative DNA binding 
protein; similar to putative DNA binding protein (OKF88) 
[Bacteriophage P4] gi j 140147 | sp | P12552 | Y9K#BPP4 (45% 
3520 identity in 53 amino acids), GTG start 
SEQ ID NO: 286: -1.10199, 202, novel 

SEQ ID NO: 287: -0.534375, 65, a putative cell division 
repressor, similar to cell division repressor led [enterobacteria 
phage Pi] gi ! 4261623 ! gb | aaD13923.1 | S61175#l (42% identity 



Appendix B: Hideo et at. Full Translation 



3525 in 45 amino acids) 

SEQ ID NO: 288: -0.325, 145, novel 

SEQ ID NO: 289: -0.088, 51, novel 

SEQ ID NO: 290: -0.079937, 320, novel 

SEQ ID NO: 291: -0.191011, 90, novel 
3530 SEQ ID NO: 292: -0.281545, 635, novels 

SEQ ID NO: 293: -0.397973, 297, novel 

SEQ ID NO: 294: -0.965741, 109, novel 

SEQ ID NO: 295: 0.008475, 60, novel 

SEQ ID NO: 296: -0.431081, 149, novel 
3535 SEQ ID NO: 297: 0.039437, 72, a putative single stranded 

DNA-binding protein, similar to single stranded DNA-binding 

proteins, for example, [Thermotoga maritima] 

gi | 7439946 i pir i | H 72354 (35% identity in 96 amino acids) 

SEQ ID NO: 298: -0.449153, 178, a putative transcription 
3540 activator, similar to transcription activator of eaeA/bfpA, PerC 

(BfpW) [Escherichia coli] gi j 11 72431 | sp | P43475 | PERC#ECOLI 

(39% identity in 89 amino acids) 

SEQ ID NO: 299: -0.283069, 190, novel 

SEQ ID NO: 300: -0.520779, 155, a putative major head protein, 
3545 similar to major head protein, for example, phage phi-C31 
gp 3 6- like protein [Haemophilus influenzae] 

gi | 6739663 | gb j aaF2 7364.1 | AF198256#18 (AF198256) (56% 
identity in 584 amino acids) 

SEQ ID NO: 301 : 0.198361, 62, a putative prohead protease, 
3550 similar to prohead proteases, for example, phage phi-C31 
gp35 _ like protein [Haemophilus influenzae] 

gi | 6739662 | gb | aaF27363.1 | AF198256#17 (60% identify in. 161 
amino acids) 

SEQ ID NO: 302: 0.183505, 98, a putative head portal protein, 
3555 similar to head portal proteins, for example, phage phi-105 
ORF25-like protein [Haemophilus 

influenzaejgi ! 6739661 | gb | aaF2 7362.1 |AF198256#16 (63% 
identity in 403 amino acids) 
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SEQ ID NO: 303: -0.097403, 78, a putative head-tail adaptor, 
3560 similar to head-tail adaptors, for example, [Bacteriophage 
HK97] gi | 6901597 | gb | aaF3 1100.1 | (47% identity in 112 amino 
acids) 

SEQ ID NO: 304: -0.730597, 269, novel, similar to hypothetical 
protein [Haemophilus influenzae] 

3565 gi | 6739659 | gb | aaF2 7360.1 | AF198256#14 (45% identity in 98 
amino acids); and hypothetical protein 30 [Bacillus phage 
phi- 105] gi | 7459182 | pir | | T13519 (26% identity in 90 amino 
acids) 

SEQ ID NO: 305: -0.554049, 569, novel, similar to hypothetical 
3570 protein, for example, I.Haemophilus influenzae] 

gi I 6739658 | gb | aaF27359.1 | AF198256#13 (54% identity in 115 
amino acids) 

SEQ ID NO: 306: -0.527872, 715, novel 

SEQ ID NO: 307: -0.766567, 336, a putative terminase small 
3575 subunit, similar to hypothetical protein, genetic island 1 
[ H a e m o p h i 1 u s infl u e n z a e ] 

gi | 6739657 j gb j aaF27358.1 | AF198256#12 (64% identity in 112 
amino acids) ; and similar to putative terminase small subunit 
[Streptococcus thermophilus bacteriophage S f i 2 1 j 

3580 gi | 5 2 3 0 82 6 | gb | a a D 4 1028.1 j AF 1 1 2 4 7 0# 3 (29% identity in 98 
amino acids). 

SEQ ID NO: 308: -0.398762, 405, a putative terminase large 
subunit, similar to terminaselarge subunits, for example, 
[Haemophilus influenzae] 

3585 gi j 6739656 j gb j aaF2 7 35 7.1 | AF198256#11 (69% identity in 550 
amino acids), TTG start 
SEQ ID NO: 309: 0.25969, 130, novel 
SEQ ID NO: 310: -0.52549, 154, novel, GTG start 
SEQ ID NO: 311 : -0.157219, 188, an integrase, similar to 

3590 integrases, for example, [Bacteriophage P21] 

gi j 138558 I sp | P2 7077 | VINT#BPP2 1 (98% identity in 380 amino 
acids), (similar to lambdaintegrase) 
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SEQ ID NO: 312: 0.063889, 217, an excisionase, similar to 
excisionases, for example, [Bacteriophage P2 1 ] 

3595 gij 139674 | sp | P27079 | VXIS#BPP21 (98% identity in 78 amino 
acids) 

SEQ ID NO: 313: -0.793334, 646, a putative replication protein, 
similar to replication protein, for example, GpO [Bacteriophage 
lambda] gi | 215150 | gb | aaA96584.1 | (69% identity in 261 amino 
3600 acids) 

SEQ ID NO: 314: -0,266292, 90, a replication protein, similar 
to replication proteins, for example, GpP [Bacteriophage 
lambda] gi | 4499785 | emb | CAB39284.1 | (98% identity in 233 
amino acids) 

3605 SEQ ID NO: 315 : -0,19875, 81, a putative Ren protein 
(protection from Rex-dependent exclusion), similar to Ren 
protein, for example, [Bacteriophage lambda] 

gi I 139473 | sp | P03761 | VREN#LAMBD (90% identity in 92 
amino acids) 

3610 SEQ ID NO: 316 : 0.06375, 81, integral membrane drug- 
resistance protein EmrE, similar to ethidium efflux protein 
EmrE (methyl viologen resistance protein C) [E. coli] 
gi | 127565 | sp | P23895 | EMRE#ECOLI (98% identity in 110 
amino acids), and belongs to the small multidrug resistance 

3615 (Smr) protein family 

SEQ ID NO: 317: -0,018342, 568, novel, similar to hypothetical 
protein YbcKl Escherichia coli] 

gi ! 2495549 I sp | P77698 | YBCK#ECOLI (99% identity in 508 
amino acids): and putative integrase [Bacteriophage A118] 

3620 gi | 1196324 | gb | aaB51416.1 | (31% identity in 109 amino acids) 
SEQ ID NO: 318: -0.248578, 423, novel, similar to hypothetical 
protein YbcN [Escherichia coli cryptic lambdoid prophage 
DLP12I gi | 2495551 | sp | Q47269 | YBCN#ECOLI (92% identity in 
151 amino acids), GTG start 

3625 SEQ ID NO: 319 : -0.218478, 93, novel, identical to NinE 
[Bacteriophage 82] gi | 3024190 j sp | Q37871 | NINE#BP82 
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SEQ ID NO: 320 : -0.159512, 206, novel, similar to YbcO 
[ E s c h e r i c h i a col i cryp t i c p r o phage DLP12] 

gi ! 2495553 | sp i Q47271 j YBCO#ECOLI (97% identity in 98 

3630 amino acids); and Gp66 [Bacteriophage HK97] 
gi I 6901638 | gh | aaF31141.1 | (68% identity in 95 amino acids) 
SEQ ID NO: 321 : -0.289344, 245, a crossover junction 
endodeoxyribonuclease, similar to crossover junction 
endodeoxyribonucleases Rus, for example, [Escherichia coli 

3635 bacteriophage 82] gi | 2498868 | sp | Q37873 | RUS#BP82 (95% 
identity in 120 amino acids), GTG start 

SEQ ID NO: 322: -0.103759, 134, a putative antitermination 
protein, similar to antitermination protein, for example, 
QtBacteriophage 82] gi | 132277 | sp | P13870 | RegQ#BP82 (98% 
3640 identity in 229 amino acids) 

SEQ ID NO: 323: -0.622936, 219, a putative holin, similar to 
putative holin protein [Bacteriophage PS3] 

gi | 3676074 | emb | Caa09700.1 | (72% identity in 103 amino 
acids), TTG start 

3645 SEQ ID NO: 324 : -0.662162, 149, a putative endolysin 
(lyzozyme), similar to endolysins, for example, [Bacteriophage 
HK97] gi | 6901642 | gb | aaF31145.1 | (95% identity in 158 amino 
acids) 
[0019] 

3650 2) Proteins which have novel function, but have significant 
homology 

Sd], ,> n i-ht 1 >b;.m, Tj n\i:iba ol amino acids, 

* i: :• v.: V- : • ':' • : K' fi , x - f I U_C tj ) 1 

SEQ ID NO: 325: -0.109639, 84, a putative endopeptidase 
3655 (host cell lysis), similar to hypothetical protein gp 1 5 
[Bacteriophage PS119] gi | 3676087 | emb | Caa09711.1 | (83% 
identity in 155 amino acids); endopeptidases for 
example , [Bacteriophage lambda] gi | 67522 j pir I j APBPML (59% 
identity in 153 amino acids) 
3660 SEQ ID NO: 326: -0.749881, 422, a putative lipoprotein Rzl 
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precursor, lipoprotein Rzl precursors, for example, 
[Bacteriophage lambda] (53% identity in amino acids) 
SEQ ID NO: 327: -0.631149, 2794, novel 

SEQ ID NO: 328 : -0.122951, 62, novel [hypothetical 
3665 membrane protein; IMP] 

SEQ ID NO: 329: -0.232456, 115, novel 

SEQ ID NO: 330: 0.222857, 71, a putative terminase large 
subunit, similar to terminase large subunits, for example. 
[Bacteriophage WO] gi j 6723224 [ dbj j Baa89621.1 1 (26% 
3670 identity in 641 amino acids); for example, [Bacteriophage N15] 
gi [ 7444579 | pir i | T13088 (25% identity in 630 amino acids) 
SEQ ID NO: 331 : -0.754198, 132, novel 

SEQ ID NO: 332: -0.709589, 220, a putative portal protein, 
similar to putative portal protein [Wolbachia sp. 
3675 wKuejgi | 6723246 [ dbj | Baa89642.1 | (23% identity in 294 amino 
acids), GTG start 

SEQ ID NO: 333: -0.319445, 73, novel 

SEQ ID NO: 334: -0.243617, 95, a putative protease /scaffold 
protein, partially similar to ClpP proteases, for example, 

3680 [Bacteriophage D3] gi | 5059251 | gb | aaD38956.1 | (35% identity 
in 218 amino acids): similar to putative scaffolding protein 
iStr e p t o c o c c u s t h e r m o p h i 1 it s b a c t e r i o p h a g e D T 1 3 

gi | 4530143 | gb j aaD21883.1 j (30% identity in 201 amino acids) 
SEQ ID NO: 335: -0.664384, 74, novel, TTG start 

3685 SEQ ID NO: 336: -0.528708, 210, novel 

SEQ ID NO: 1570: -0.651901, 448, similar to minor tail proteins, 
for example, proteinZ I Bacteriophage N15j 

gi | 7521219 | pir | | T13097 (52% identity in 192 amino acids); 
GpZ [Bacteriophage lambda] 

3690 gi j 138849 | sp | P03731 | V M T Z # L A M B I) (49% identity in 192 
amino acids) 

SEQ ID NO: 1030 : 0.101176, 511, a putative minor tail 
component, similar to minor tail proteins, for example, protein 
U [Bacteriophage N15] gi j 7444588 i pir j IT13098 (49% identity 
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3695 in 129 amino acids); GpU [Bacteriophage lambda] 
gi j 138847 | sp | P03732 | VMTU#LAMBD (49% identity in 129 
amino acids) 

SEQ ID NO: 1031 : -0.163804, 164, a major tail component, 
similar to major tail proteins, for example, protein V 
3700 [Bacteriophage N15] gi j 7444589 j pir I | T13099 (62% identity in 
244 amino acids); GpV [Bacteriophage lambda] 
gi | 138848 | sp | P03733 | VMTV#LAMBD (55% identity in 246 
amino acids) 

SEQ ID NO: 1032: -0.270741, 271, a minor tail component, 
3705 similar to minor tail proteins, for example, GpG [Bacteriophage 
lambda] gi | 138842 j sp | P03734 [ VMTG#LAMBD (33% identity in 
109 amino acids) 

SEQ ID NO: 1033 : 0.038403, 264, a putative minor tail 
component, similar to minor tail proteins, for example ,GpT 
3710 [Bacteriophage lambda] gi I 138846 j sp | P03 735 j VMTT#LAMBD 
(39% identity in 104 amino acids), probably produced by 
t r a n s 1 a t i 0 n a 1 fr a m eshift 

SEQ ID NO: 1034: -0.454546, 210, a putative tail length tape 
measure protein precursor, similar to tail length tape measure 
3715 protein precursors for example ,GpH [Bacteriophage lambda] 
gi I 138843 I sp I P03736 | VMTH#LAMBD (25% identity in 822 
amino acids) 

SEQ ID NO: 1035 : -0.041442, 445, a putative minor tail 
protein, similar to minor tail proteins for example ,GpM 
3720 [Bacteriophage lambda] gi | 138845 j sp I P03737 j V M T M # L A M B D 
(55% identity in 108 amino acids) 

SEQ ID NO: 1036 : -0.442976, 841, a putative minor tail 
protein, similar tominor tail proteins for example ,GpL 
[Bacteriophage lambda] gi j 138844 j sp ! P03738 j V M T L # L A M B D 
3725 (93% identity in 232 amino acids) 

SEQ ID NO: 1037: -0.153648, 234, a putative tail assembly 
protein, similar to tail assembly proteins for example .GpK 
[Bacteriophage lambda! gi j 139638 j sp ! P03729 j VTAK#LAMBD 
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(97% identity in 199 amino acids) 
3730 SEQ ID NO: 1038: 0.21129, 187, a putative tail assembly 
protein, similar to tail assembly proteins for example ,GpI 
[Bacteriophage lambda] gi | 139637 | sp | P03730 | V T A I # L A M B D 
(80% identity in 215 amino acids) 

SEQ ID NO: 1039: -0.061353, 208, a putative host specificity 
3735 protein, similar to host specificity proteins for example ,GpJ 
[Bacteriophage lambda] gi | 138412 | sp | P03749 | VHSJ#LAMBD 
(88% identity in 1136 amino acids) 

SEQ ID NO: 1040: -0.166719, 1269, a putative outer membrane 
protein precursor, similar to outer membrane protein Lorn 
3740 precursors for example , [prophage P-EibAj 

gi | 7532789 | gb | aaF63231.1 | AF151091#2 (72% identity in 199 
amino acids) 

SEQ ID NO: 1041 : -0.41948, 540, a putative tail fiber 
protein, similar to tail fiber proteins for 
3745 example , [Bacteriophage 933 W] 

gi | 4585436 | gb | aaD25464.1 | AF125520#59 (67% identity in 277 
amino acids) 

SEQ ID NO: 1042 : 0.009016, 123, novel, similar to 
hypothetical proteins for example , [Bacteriophage 933W3 
3750 gi | 458 5 4 3 7 | gb | a a D 2 5 4 6 5 . 1 I AF 1 2 5 52 ()# 6 0 (98% identity in 102 
amino acids) 

SEQ ID NO: 1043 : 0.422222, 190, novel, similar to 
hypothetical p r o tein [Salmonella t y phi m urium L T 2 ] 

gi | 6960367 I gb I aaF33527.1 | (55% identity in 314 amino acids) 
3755 SEQ ID NO: 1044: -0.17033, 183, novel 
SEQ ID NO: 1045: -0.29785, 94, novel 
SEQ ID NO: 1046: -0.139896, 387, novel 
SEQ ID NO: 1047: -0.09284,853, novel 

SEQ ID NO: 1048: -0.12362, 327, novel, similar to secreted 
3760 effector proteinopA, [Salmonella dublin] 

gi ! 5669806 ! gb ! aaD46479.1 |AF121227#1 (24% identity in 296 
amino acids), similar to hypothetical proteins for 
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example ,YjBI [Escherichia coli] 

gi i 418540 ! sp | P32690 j YJBI#ECOLI (26% identity 183 amino 
3765 acids), weakly 

SEQ ID NO: 1049 : -0.341696, 284, novel [hypothetical 
membrane protein; IMPj 

SEQ ID NO: 1050: 0.074894, 236, a putative PTS transporter 
protein, similar to putative transporter proteins for 
3770 example ,SgaT [Escherichia coli] 

gi j 2851673 j sp j P39301 | SGAT#ECOLI (38% identity in 440 
amino acids) 

SEQ ID NO:1051 : -0.083945, 219, a putative PTS system 
enzyme II, similar to phosphotransferase system enzymes IIBs 
3 77 5 for example , [Escherichia coli] 

gi | 732028 ! sp | P39302 | PTXB#ECOLI (28% identity in 99 amino 
acids) 

SEQ ID NO: 1052: 0.436468,437, novel 
SEQ ID NO: 1053: "0.546947, 263, novel, GTG start 
3780 SEQ ID NO: 1054 : -0.377489,463, novel 
SEQ ID NO: 133: -0.3865, 401, unkown 

SEQ ID NO: 134 : -0.199834, 606, a putative integrase, 
similar to integrases for example ,[ Bacteriophage HK022] 
g i | 1 3 8560 | sp | P 1 6 4 0 7 | V I N T#B P 11 K 0 ( 2 7 % i d e n t i t y i n 3 2 1 
3785 amino acids) 

SEQ ID NO: 135: "0.420689, 146, novel 
SEQ ID NO: 136: -0.487755, 99, novel 

SEQ ID NO: 137 : -0.331236, 462, novel, similar to 
hypothetical proteins for example ,YdfD [Escherichia coli] 
3790 gi ! 140587 j sp ! P29010 | YD FD#ECOLI (63% identity in 63 amino 
acids) 

SEQ ID NO: 138: -0.780214, 188, a putative cell division 
inhibition, similar to dicB [Escherichia coli] 
gi j 2507009 | sp | P09557 j DICB#ECOLI (54% identity in 62 
3795 amino acids) 

SEQ ID NO: 139: -0.17888, 787, novel, TTG start 
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SEQ ID NO: 140: 0.226, 51, novel 
SEQ ID NO: 141: -0.445312, 513, novel 
SEQ ID NO: 142: 0.010435, 116, novel 
3800 SEQ ID NO: 143 : -0.395489, 134, novel, similar to YdfB 
[Escherichia coli] gi | 140585 | sp | P29009 | YDFB#ECOLI (100% 
identity in 41 amino acids) 

SEQ ID NO: 144: -0.538835, 104, novel, identical to YdfA 
[Escherichia coli] gi | 140584 I sp | P29008 | YDFA#ECOLI (100% 

3 805 identity in 51 amino acids) 

SEQ ID NO: 145: -0.684191, 273, novel, TTG start 
SEQ ID NO: 146 : -0.275807, 249, novel, similar to 
hypothetical proteins for example ,yaeB [ plasmid ColIb-P9j 
gi | 4512441 | dbj ! Baa75090.1 I (35% identity in 92 amino acids) 

3810 SEQ ID NO: 147: -0.519277, 84, novel 

SEQ ID NO: 148: -0.448958, 97, a putative regulatory protein, 
similar to putative regulatory protein I Salmonella 

typhimurium] gi | 7467281 | pir | | T03008 (30% identity in 108 
amino acids); Die A [Escherichia coli] 

3815 gi | 118631 | sp | P06966 | DICA#ECOLI (27% identity in 108 amino 
acids) 

SEQ ID NO: 149: -0.025758, 67, novel 

SEQ ID NO: 150 : 0.918487, 120, novel, similar to YdaT 
[Escherichia coli] gi | 3025103 | sp | P76064 | YDAT#ECOLI (31% 
3820 identity in 141 amino acids) 

SEQ ID NO: 151 : -0.246963, 429, novel 
SEQ ID NO: 152: 0.574468,48, novel 

SEQ ID NO: 153 : 0.214286, 92, a putative DNAreplication 
protein, similar to DnaC ho mo log [Escherichia coli] 
3825 gi | 7429001 j pir j | C64886 (79% identity in 248 amino acids); 

DnaC [Escherichia coli] gi | 118715 j sp | P07905 1 DNAC#ECOLI 
(48% identity in 242 amino acids) 

SEQ ID NO: 154 : -0.016418, 68, novel, similar to 
gi j 3025105 | sp ! P76066 j YDAW#ECOLI (54% identity in 155 
3830 amino acids) 
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SEQ ID NO: 155: -0.025506,248, novel 

SEQ ID NO: 156: 0.022, 101, novel, similar to hypothetical 
proteins for example ,IroE [Salmonella enterica] 

gi ! 2738251 | gb | aaC46182.1 I (29% identity in 249 amino acids) 
3835 SEQ ID N 0-157: -0.369811, 107, novel 
SEQ ID NO: 158: -0.00581, 569, novel 

SEQ ID NO: 159 : -0.291558, 155, a putative prophage 
maintenance protein, similar to Hok/Gef family for 
example ,MokW [Bacteriophage 933W] 

3840 gi j 4585453 j gb j aaD25481.1 j AF125520#76 (92% identity in 65 
amino acids) 

SEQ ID NO: 160 : -0.194196, 225, novel, similar to QD1 
[Bacteriophage N15] gi | 2564084 | gb | aaB81659.1 | (31% identity 
in 64 amino acids) 

3845 SEQ ID NO: 161: -0.083415,206, novel 

SEQ ID NO: 162 : -0.462832, 114, a putative crossover 
junction endodeoxyribonuclease, similar to Gp67 [Bacteriophage 
HK97] gi | 6901639 | gb | aaF3 1142.1 | (60% identity in 113 amino 
acids); crossover junction endodeoxyribonuclease Ens 

3850 [Escherichia coli cryptic prophage DLP12.I 

gi [ 2507117 | sp | P40116 | RUS#ECOLI (40% identity in 115 amino 
acids) 

SEQ ID NO: 163 : 0.998039, 52, a putative antitermination 
protein, similar to bacteriophage antitermination proteins 
3855 for example ,YbcQ [Escherichia coli cryptic prophage DLP12 
gi | 4585416 | gb | aaD25444.I j AF125520#39 (77% identity in 124 
amino acids) 

SEQ ID NO: 164 : -0.436782, 88, novel, similar to 
[hypothetical membrane protein] YpbD [Bacillus subtilis] 
3860 gi j 1730886 | sp [ P50730 | YPBD#BACSU (30% identity in 128 
amino acids) 

SEQ ID NO: 165: -0.286022, 94, novel, similar to hypothetical 
protein [Bacteriophage P27] gi I 8346569 | emb I CAB93762. 1 [ 
(97% identity in 49 amino acids) 
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3865 SEQ ID NO: 166: 0.757522, 114, a putative transcription 
regulatory element, similar to transcription regulatory 
elements for example , YhiW [Escherichia coli] 

gi j 586679 I sp | P37638 | YHIW#ECOLI (37% identity in 187 
amino acids) 

3870 SEQ ID NO: 167 : 0.175785, 224, novel, similar to 
hypothetical proteins for example , [Bacteriophage 93 3 W] 
gi j 4585419 j gb j aaD25447.1 !AF125520#42 (53% identity in 613 
amino acids) 

SEQ ID NO: 168: -0.464706, 52, a transposase, identical to 
3875 hypothetical protein [Escherichia coli plasmid p 0-157 
insertion sequence IS629] gi | 7444868 i pir i j T0024 1 (100% 
identity in 116 amino acids) 

SEQ ID NO: 169: -0.152174, 254, a putative transposase, 
similar to transposases for example , [Escherichia coli 

3880 plasmid p 0-157 insertion sequence IS629] 

gi | 7443862 j pir j I T00240 (98% identity in 220 amino acids) 
SEQ ID NO: 170: -0.400502, 200, a putative transcription 
regulatory element, similar to PerC (BfpW) [Escherichia coli] 
gi | 1172431 | sp | P43475 | PERC#ECOLI (47% identity in 87 

3885 amino acids) 

SEQ ID NO: 171: -0.431915, 142, a lipoprotein Rzl protein 
precursor, similar to Rzl precursors for 

example .[Bacteriophage 933W] 

gi j 4585425 | gb | aal)25453.1. | AF125520#48(98% identity in 61 

3890 amino acids): [Bacteriophage lambda] 

gi I 540738 | pir | | JN0750(70% identity in 61 amino acids) 
SEQ ID NO: 172: -0.121552, 117, a endopeptidase (host cell 
lysis), similar to endopeptidases for example , [Bacteriophage 
VT2-Sa] gi | 5881639 | dbj | Baa84330.1 | (88% identity in 154 

3895 amino acids) 

SEQ ID NO: 173: -0.561452, 538, novel 
SEQ ID NO: 174: -0.275207,243, novel 

SEQ ID NO: 175: -0.345833, 121, a host cell lysis, similar to 
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endolysins for example .[Bacteriophage H-19B] 

3900 gi | 4335686 I gb | aaD17382.1 | (94% identity in 177 amino acids) 
SEQ ID NO: 176: -0.521101, 110, novel 
SEQ ID NO: 177: -0.46, 156, novel 
SEQ ID NO: 178: -0.444527, 403, novel 

SEQ ID NO: 179: -0.033648, 319, a holin protein (host cell 
3905 lysis), similar to holin proteins for example .[Bacteriophage 
VT2-Saj gi | 5881636 | dbj | Baa84327.1 | (91% identity in 69 
amino acids) 

SEQ ID NO: 180: 0.066393, 245, novel, GTG start 
SEQ ID NO: 181 : -0.292064, 127, novel, similar to 
3910 hypothetical proteins for example ,L0013 [Escherichia coli 
0-157:H7 strain EDL933] gi | 3414881 j gb | aaC31492. 1 | (99% 
identity in 133 amino acids) 

SEQ ID NO: 182 : -0.271985, 258, novel, identical to 
hypothetical proteins for example , LOO 14 [Escherichia coli 
3915 0-157:H7 strain EDL933] gi \ 3 4 14882 | gb j aa C 3 1 4 9 3 . 1 | ( 1 0 0 % 
i d e n t i t y in 1 1 5 a m i n o a c ids) 

SEQ ID NO: 183 : -0.112369, 381, novel, similar to 
hypothetical proteins for example ,L0015 [Escherichia coli 
0-157:H7 strain EDL9331 gi | 3414883 | gb | aaC31494.1 | (100% 

3920 identity in 512 amino acids) 

SEQ ID NO: 184: -0.165341, 353, a putative terminase small 
subunit, similar to C-terminal part of terminase small subunits 
for example , [Bacteriophage N15] 

gi | 2507082 | sp | P31061 | NOHA#ECOLI(46% identity in 75 

3925 amino acids), GTG start, probably disrupted by IS insertion 

SEQ ID NO: 185 : -0.206736, 194, a terminase large subunit, 
similar to terminase large subunits for 
example , [Bacteriophage 21] 

gi j 2851579 | sp | P36693 | TERL#BPP21 (91% identity in 637 

3930 amino acids) 

SEQ ID NO: 186: -0.392375, 342, a portal protein, similar to 
portal proteins for example ,GP4 [Bacteriophage P2lj 
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gi | 549295 ! sp | P36272 | VG04#BPP21 (98% identity in 530 amino 
acids) 

3935 SEQ ID NO: 187: -0.188742, 152, a head-tail preconnector 
protein, similar to head-tail preconnector proteins for 
example ,Gp5 [Bacteriophage P21] 

gi | 549296 j sp | P36273 j VG05#BPP21 (97% identity in 501 amino 
acids), GTG start 

3940 SEQ ID NO: 188: 0.734105, 347, a head decoration protein, 
similar to head decoration proteins for example ,Gpshp 
[Bacteriophage P21] gi | 549437 | sp | P36275 | VSHP#BPP21 (95% 
identity in 115 amino acids) 

SEQ ID NO: 189: -0.317188, 193, a possible major head protein, 
3945 similar to N-terminal part of major head proteins for 
example ,Gp7 [Bacteriophage P21] 

gi | 547612 | sp | P36270 | HEAD#BPP2 1(95% identity in 88 amino 
acids) 

SEQ ID NO: 190: -0.249738, 192, novel 
3950 SEQ ID NO: 191 : 0.297015, 68, a putative tail component, 
similar to minor tail proteins for example ,GpG 
[Bacteriophage lambda] gi j 138842 | sp j P03734 [ VMTG#LAMBD 
(68% identity in 143 amino acids) 

SEQ ID NO: 192 : -0.083333, 103, a putative minor tail 
395 5 component, similar to minor tail protein GpG-T [Bacteriophage 
lambda] gi | 7429179 | pir | | TLBPTL (72% identity in 124 amino 
acids), probably produced by translational frameshiftSEQ ID 
NO: 193 : 0, 75, a tail length determinator, similar to tail 
length tape measure proteins for example ,GpH 

3960 [Bacteriophage lambda] gi | 138843 | sp | P03736 j V M T H # L A M B D 
(77% identity in 859 amino acids) 

SEQ ID NO: 194: -0.427011, 697, a minor tail component, 
similar to minor tail proteins for example ,GpM 
[Bacteriophage lambda] gi j 138845 | sp | P03737 \ V M T M # L A M B D 
3965 (82% identity in 109 amino acids) 

SEQ ID NO: 195: 0.565, 41, a minor tail component, similar to 
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minor tail proteins for example ,GpL [Bacteriophage lambda] 
gi j 138844 | sp | P03738 | VMTL#LAMBD (76% identity in 232 
amino acids) 

3970 SEQ ID NO: 196 : 0.101111, 91, a tail assembly protein, 
similar to tail assembly proteins for example ,GpK 
[Bacteriophage lambda] gi j 139638 ! sp ! P03729 ! VTAK#LAMBD 
(84% identity in 196 amino acids) 

SEQ ID NO: 197: -0,5, 51, a tail assembly protein, similar to 
3975 tail assembly proteins for example ,GpI [Bacteriophage 
lambda] gi | 139637 | sp | P03730 | VTAI#LAMBD (68% identity in 
224 amino acids) 

SEQ ID NO: 198: -1.1875, 65, novel 

SEQ ID NO: 199 : -0.140541, 75, a copper/zinc superoxide 
3980 disrautase, similar to copper/zinc- superoxide dismutases for 
example , [Salmonella typhimurium] 

gi | 2462699 | emb | Caa73588.1 | (58% identity in 175 amino 
acids) 

SEQ ID NO: 200: -0.113333, 91, a putative host, specificity 
3985 protein, similar to host specificity proteins for example ,GpJ 
[Bacteriophage lambda! gi j 138412 | sp | P03 749 | VHSJ#LAMBD 
(65% identity in 1156 amino acids) 

SEQ ID NO: 201 : -0.59375, 65, a putative outer membrane 
protein, similar to Lorn outer membrane proteins for 
3990 example , [prophage P-EibA] 

gi | 7532789 | gb | aaF63231.1 | AF151091#2 (68% identity in 199 
amino acids) 

SEQ ID NO: 202: 0.147917, 49, a putative tail fiber protein, 
similar to putative tail fiber proteins for 

3995 example , [Bacteriophage 933W] 

gi j 4585436 | gb | aaD25464.1 | AF125520#59 (38% identity in 370 
amino acids) 

SEQ ID NO: 203 : -0.707843, 103, novel, similar to 
hypothetical protein [Bacteriophage 933W] 

4000 gi | 4585437 j gb ! aaD25465.1 ! AF125520#60 (93% identity in 129 
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amino acids); similar to Oterminal part of putative tail 
protein [933W] gi I 4585436 | gb | aaD25464.1 j AF125520#59(93% 
identity in 89 amino acids) 

SEQ ID NO: 204: 0.03369, 375, novel, GTG start 
4005 SEQ ID NO: 205: -0.295604, 92, a putative secreted effector 
protein, similar to EspF proteins for example , [Escherichia 
coli strain E2348/69] gi | 2865308 | gb | aaC38400.1 | (37% 
identity in 87 amino acids); L0016 - Escherichia coli 
gi | 341.4884 | gb j aaC31495.1 j (38% identity in 126 amino acids) 
4010 SEQ ID NO: 206: -0.495808, 168, novel, partially similar to 
a virulence protein A [Pseudomonas syringae] 

gi | 114726 | sp | P11437 I AVRA#PSESG (46% identity in 56 amino 
acids) 

SEQ ID NO: 207 : -0.350549, 92, a putative integrase, 
4015 identical to integrase [Bacteriophage 933 W] 

gi | 4585378 | gb | aaD25406.1 | AF125520#1, but [having] 

defferent start; similar to integrases for 
e x a m p I e , [ E s c h e r i c h i a c o 1 i rac p r o p h age] 

gi | 6166234 i sp | P76056 | INTR#ECOLI (42% identity in 408 
4020 amino acids) 

SEQ ID NO: 208 : 0.199342, 153, a putative excisionase, 
identical to putative excisionase [Bacteriophage 93 3 W] 
gi | 4585379 j gb j aaD25407.1 | AF125520#2 

SEQ ID NO: 209 : 0.463492, 64, novel, identical to 
4025 hypothetical protein [Bacteriophage 933W] 

gi ! 4585380 I gb I aaD25408.1 j AF125520#3 

SEQ ID NO: 210 : -0.033136, 170, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi j 4585381 | gb | aaD25409.1 | AF125520#4, but [having] 

403 0 defferent start 

SEQ ID NO: 211 : -0.402415, 208, novel, identical to 
hypothetical protein [Bacteriophage 93 3 W] 

gi ! 4585382 ! gb ! aaD25410.1 | AF125520#5: similar to 

hypothetical protein [Bacteriophage 93 3 W] 
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4035 gi ! 4585455 j gb | aaD25483,l |AF125520#78 (50% identity in 80 
amino acids) 

SEQ ID NO: 212 : -0.577922, 78, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

4040 gi 1 4585383 \ gb \ aaD25411.1 j AF125520#6 (100% identity in 95 
amino acids) 

SEQ ID NO: 213 : 0.356338, 72, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi j 4585384 j gb j aaD25412.1 |AF125520#7 (100% identity in 72 

4045 amino acids), GTG start 

SEQ ID NO: 214 : -0.410847, 296, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi | 4585385 ! gb j a a D 2 5 4 1 3 . 1 j AF 1 2 5 5 2 0# 8 (100% identity in 95 
amino acids), GTG start 

4050 SEQ ID NO: 215 : "0.942593, 109, novel, identical to 
hypothetical protein [Bacteriophage VT2~Sa] 

gi | 5881600 | dbj | Baa84291.1 | (100% identity in 155 amino 
acids) 

SEQ ID NO: 216 : -0.260656, 245, novel, identical to 
4055 hypothetical protein [Bacteriophage 933W] 

gi [ 4 5 8 5 3 8 6 ! gb I a a D 2 5 4 1 4.1 I AF 1 2552 ()# 9 (100% identity in 257 
amino acids);. similar to hypothetical proteins for 
example , [Bacteriophage 933W] 

gi j 4585455 j gb j aal)25483.1 | AF125520#78 (95% identity in 157 
4060 amino acids), GTG start 

SEQ ID NO: 217: -0.421638, 172, novel, similar to C4-type 
zinc finger proteins (TraR family) for 

example ,gi | 4585456 | gb | aaD25484. 1 | AF125520#79 (79% 
identity in 73 amino acids) 
4065 SEQ ID NO: 218 : -0.312093, 646, novel, identical to 
hypothetical protein [Bacteriophage 933 W] , but [having] 
defferent start; similar to orf61 [Bacteriophage lambda] 
gi | 508993 | gb j aaA96566.1 | (93% identity in 46 amino acids) 
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SEQ ID NO: 219 : -0.186957, 47, novel, identical to 
4070 hypothetical protein [Bacteriophage VT2~Sa] 

gi j 5881603 | dbj | Baa84294.1 | (100% identity in 63 amino 
acids); similar to orf63 [Bacteriophage lambda] 
gi | 508994 j gb j aaA96567.1 | (90% identity in 61 amino acids) 
SEQ ID NO: 220 : -0.418537, 411, novel, identical to 
4075 hypothetical protein [Bacteriophage VT2-Sa] 

gi I 5881604 | dbj i Baa84295.1 | , but [having] defferent start; 
similar to orf60a [Bacteriophage lambda] 

gi | 508995 I gb | aaA96568.1 | (96% identity in 60 amino acids) 
SEQ ID NO: 221: -0.531132, 213, a exonuclease, similar to 
4080 exonuclease [ Bacteriophagelambda ] gi | 2981722 | pdb | 1AVQ | A 
(98% identity in 226 amino acids) 

SEQ ID NO: 222: -0.137079, 90, a recombination protein Bet, 
identical to Bet [Bacteriophage VT2-Sa] 

gi | 5881606 | dbj | Baa84297.1 | (100% identity in 261 amino 
4085 acids); similar to Bet [Bacteriophage lambda] 
gi | 137511 | sp | P03698 | VBET#LAMBD (99% identity in 261 
amino acids) 

SEQ ID NO: 223: -0.533645, 215, a host-nuclease inhibitor 
protein Gam., similar to Gam proteins for 
4090 example , [Bacteriophage lambda] 

gi | 138128 1 sp | P03702 j V G A M # L A M B D (97% identity in 138 
amino acids) 

SEQ ID NO: 224: -0.435294, 52, a Kil protein, identical to kil 
[Bacteriophage VT2-Sa] gi | 5881608 | dbj | Baa84299.1 [ ; similar 
4095 to kill proteins for example , [Bacteriophage lambda] 
gi ! 138622 j sp | P03758 | VKIL#LAMBD (98% identity in 89 amino 
acids) 

SEQ ID NO: 225 : -0.714458, 167, a regulatory proteincIII 
(anti termination), identical to cIII [Bacteriophage lambda] 
4100 gi ! 133366 j sp | P03044 j R P C 3#L AM B D (100% identity in 54 
amino acids) 

SEQ ID NO: 226: 0.126027, 74, a single strandbinding protein 
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EalO, identical to EalO [Bacteriophage VT2-Sa] 
gi I 5881610 | dbj | Baa 84 30 1.1 | (100% identity in 122 amino 
4105 acids); similar to EalO [Bacteriophage lambda] 
gi j 137630 I sp | P03757 | VE10#LAMBD (99% identity in 122 
amino acids) 

SEQ ID NO: 227 : -0.575177, 142, novel, identical to 
hypothetical protein [Bacteriophage VT2-Sa] 

4110 gi | 5881612 | dbj | Baa84303.1 | (100% identity in 83 amino acids) 
SEQ ID NO: 228: -1.413333, 61, a putative anti-termination 
N protein, identical to N protein [Bacteriophage VT2-Sa] 
gi | 5881613 | dbj | Baa84304.1 | , hut [having] defferent start; 
similar to N proteins for example , I Bacteriophage 933W] 

4115 gi | 4585397 | gb | aaD25425.1 | AF125520#20 (42% identity in 90 
amino acids) 

SEQ ID NO: 229: -0.125172, 291, novel 
SEQ ID NO: 230: -0.297787, 950, novel 

SEQ ID NO: 231 : -0.469647, 795, novel, identical to 
4120 hypothetical protein [Bacteriophage VT2-Sa] 

gi | 5881614 | dbj | Baa84305.1 | (100% identity in 173 amino 
acids) 

SEQ ID NO: 232 : -0.370764, 302, a putative cl repressor 
protein, similar to cl [Bacteriophage lambda] 
4125 gi | 133353 1 sp | P03034 j RPCi#LAMBD (70% identity in 208 
amino acids) 

SEQ ID NO: 233: 0.007584, 357, a putative regulatory protein, 
identical to hypothetical protein [Bacteriophage VT2-Sa3 
gi I 5881616 I dbj i Baa84307.1 | ; similar to c2 [Bacteriophage L] 

4130 gi | 1469215 j emb j Caa63999.1 | (42% identity in 49 amino acids) 
SEQ ID NO: 234: 0.418519, 55, a regulatory protein GIL 
identical to CII protein [Bacteriophage VT2-Sa] 

gi j 5881617 | dbj | Baa84308.1 | (100% identity in 98 amino 
acids); similar to CII proteins for example , [Enterobacteria 

4135 phage HK022] gi | 631957 | pir | j S42398 (96% identity in 98 
amino acids) 
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SEQ ID NO: 235 : -0.554044, 273, novel, identical to 
hypothetical protein [Enterobacteria phage HK022] 

gi j 632160 | pir | | S42399 (100% identity in 48 amino acids); 

4140 similar to orf48 [Bacteriophage P22] 

gi | 871503 j emb | Caa55155.1 I (85% identity in 48 amino acids) 
SEQ ID NO: 236: -0.290062, 162, a endopeptidase (host cell 
lysis), similar to endopeptidases for example , [Bacteriophage 
lambda] gi | 119368 | sp | P00726 I ENPP#LAMBD (97% identity in 

4 1 4 5 153 am i n o a c i d s) 

SEQ ID NO: 237: -0.084177, 159, a lipoprotein Rzl precursor, 
similar to Rsl precursors for example , [Bacteriophage lambda] 
gi | 540738 | pir | | JN0750 (96% identity in 60 amino acids) 
SEQ ID NO: 238: -0.384931, 74, novel, similar to Bor protein 

415 0 prec u r s o r s f o r e x a m p 1 e , [ B a c t e r i o p h a g e 1 a m b d a ] 

gi | 137520 | sp | P26814 | VBOR#LAMBD (98% identity in 97 
amino acids) 

SEQ ID NO: 239 : -0.322581, 125, novel, similar to 
hypothetical proteins for example ,YbcV [Escherichia coli] 
4155 gi | 2495556 i sp ] P77598 | YBCV#ECOLI (98% identity in 150 
amino acids) 

SEQ ID NO: 240: -0.276613, 125, novel, identical to YbcW 

[Escherichia coli] gi I 2495557 j sp | P75720 | YBCW#ECOLI 

SEQ ID NO: 241 : 0.049693, 164, novel, similar to 

4160 hypothetical proteins for example J Escherichia coli I 
gi | 1778472 | gb | aaB40755.1 j (98% identity in 64 amino acids) 
SEQ ID NO: 242: -0.307692, 66, a terminase small subunit, 
similar to terminase smallsubunits for example , Nul 
[Bacteriophage lambda] gi | 139026 | sp ! P03707 1 TERS#LAMBD 

4165 (97% identity in 181 amino acids) 

SEQ ID NO: 243 : -0.415, 281, a putative terminase large 
subunit, similar to terminase large subunits for example , 
protein A [Bacteriophage lambda] 

gi j 137616 I sp | P03708 | TERL#LAMBD (99% identity in 641 

4170 amino acids), GTG start 
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SEQ ID NO: - : 0.61519, 80, a head-to-tail joining protein, 
similar to head-to-tail joining proteins for example ,GpW 
[Bacteriophage lambda] gi j 138415 | sp ! P03727 j V H T J # L A M B D 
(98% identity in 68 amino acids) 
4175 SEQ ID NO: 485: -0.691397, 373, a putative portal protein, 
similar to portal proteins for example ,GpB [Bacteriophage 
lambda] gi | 138762 | sp | P03710 | VMCB#LAMBD (98% identity in 
533 amino acids) 

SEQ ID NO: 486: -0.496629, 90, a minor capsid protein, similar 
4180 to minor capsid proteins for example , protein C 
[Bacteriophage lambda] gi j 137565 j sp | P03711 I V C A C # L A M B D 
(97% identity in 439 amino acids), GTG start, containing 
Nu3-homolog 

SEQ ID NO: 487 : -0.65931, 146, a major capsid protein, 
4185 similar to major capsid proteins for example ,GpD 
[Bacteriophage lambda] 
gi | 137566 | sp | P03712 | VCAI)#LAM BD(99% identity in 110 
amino acids) 

SEQ ID NO: 488 : 0.03027, 186, a putative major capsid 
4190 protein, similar to major capsid proteins for example ,GpE 
[Bacteriophage lambda! gi | 116752 | sp | P03713 | HEAD#LAMBD 
(98% identity in 341 amino acids) 

SEQ ID NO: 489: -0.356579, 77, a DNA packaging protein, 
similar to DNA packaging proteins for example ,GpFI 
4195 [Bacteriophage lambda! gi | 139324 | sp | P03709 | VPF1#LAMBI) 
(98% identity in 132 amino acids) 

SEQ ID NO: 490: -0.53038, 159, a minor capsid protein, similar 
to minor capsid proteins for example ,GpFII [Bacteriophage 
lambda] gi | 137575 | sp j P03714 \ VCF2#LAMBD(94% identity in 
4200 117 amino acids), GTG start 

SEQ ID NO: 491: -0.797196, 108, a minor tail protein, similar 
to minor tail proteins for example ,GpZ [Bacteriophage 
lambda] gi | 138849 | sp j P03731 | VMTZ#LAMBD (98% identity in 
192 amino acids) 
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4205 SEQ ID NO: 492: -0.397163, 142, a minor tail protein, similar 
to minor tail proteins for example ,GpU [Bacteriophage 
lambda] gi ! 138847 | sp | P03732 j V M T U # L A M B D (100% identity 
in 131 amino acids) 

SEQ ID NO: 493 : -0.69942, 346, a major tail protein V, 
4210 similar to major tail proteins for example ,GpV 
[Bacteriophage lambda] gi | 138848 ! sp j P03733 | V M T V # L A M B D 
(95% identity in 246 amino acids) 

SEQ ID NO: 494: -0.687309, 198, a minor tail protein, similar 
to minor tail proteins for example , GpG [Bacteriophage 
4215 lambda] gi | 138842 | sp j P03734 j VMTG#LAMBD (96% identity in 
140 amino acids) 

SEQ ID NO: 495 : -0,404622, 239, a putative minor tail 
protein, similar to minor tail proteins for example ,GpT 
[Bacteriophage lambda] gi i 138846 | sp | P03 735 | VMTT#LAMBD 
4220 (99% identity in 144 amino acids), probably produced by 
t r anslat i o n a 1 f r ameshift 

SEQ ID NO: 496: -0.494286, 106, a tail length tape measure 
protein precursor, similar to tail length tape measure protein 
precursors for example ,GpH [Bacteriophage lambda] 
4225 gi [ 138843 j sp | P03736 | VMTH#LAMBD (96% identity in 849 
amino acids) 

SEQ ID NO: 49 7: -0.175, 101, a minor tail protein, similar to 
minor tail proteins for example .GpM [Bacteriophage 
lambda] gi | 138845 | sp | P03737 | V M T M # L A M B D (94% identity in 
423 0 109 amino acids) 

SEQ ID NO: 498: -0.355238, 106, a minor tail protein, similar 
to minor tail proteins for example ,GpL [Bacteriophage 
lambda] gi | 138844 | sp | P03738 | VMTL#LAMBD (98% identity in 
232 amino acids) 

4235 SEQ ID NO: 499: -0.282857, 106, a tail assembly protein, 
similar to tail assembly proteins for example ,GpK 
[Bacteriophage lambda] gi j 139638 j sp | P03729 j V T A K # L A M B D 
(97% identity in 199 amino acids) 
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SEQ ID NO: 500 : -0.675172, 146, a tail assembly protein, 
4240 similar to tail assembly proteins for example ,GpI 
[Bacteriophage lambda] gi j 139637 ! sp 1 P03730 j VTAI#LAMBI) 
(98% identity in 223 amino acids) 

SEQ ID NO: 501 : 0.114286, 64, a host specificity protein, 
similar to host specificity proteins for example ,GpJ 
4245 [Bacteriophage lambda] gi j 138412 | sp j P03749 j VHS J#LAMBD 
(89% identity in 1131 amino acids) 

SEQ ID NO: 502 : -0.550256, 196, a putative membrane 
protein precursor, similar to membrane protein Lorn precursors 
for example , [prophage P-EibA 

4250 gi | 7532789 I gb | aaF63231.1 | AF151091#2 (69% identity in 199 
amino acids); [Bacteriophage lambda] 

gi | 138693 | sp | P03701 | VLOM#LAMBD (44% identity in 199 
amino acids) 

SEQ ID NO: 503: 0.15098, 52, a putative tail fiber protein, 
4255 similar to putative tail fiber proteins for example ,Gp37 
[Escherichia coli] gi | 7466858 | pir | | G64887 (95% identity in 
496 amino acids) 

SEQ ID NO: 504: 0.198571, 71, a tail fiber assembly protein, 
similar to tail fiber assembly proteins for example ,Orfl94 
4260 [Bacte r i o p h a g e 1 a m b d a ] gi | 139990 | sp | P03740 | Y 1 9 4#L AM B D 
(92% identity in 191 amino acids) 

SEQ ID NO: 505: -0.96087, 93, novel, similar to hypothetical 
proteins for example , putative catalase [Salmonella 
typhimurium] gi | 7162108 | emb | CAB76676.1 | (84% identity in 

4265 289 amino acids) 

SEQ ID NO: 506 : -0.407736, 350, novel, similar to 
hypothetical proteins for example ,YciE [Escherichia coli] 
gi ! 775201 j gb j aaA65179.1 | (88% identity in 168 amino acids) 
SEQ ID NO: 507 : -0.273387, 125, novel, similar to 

42 7 0 hypothetical proteins for example ,YciF [Escherichia coli] 
gi j 140432 | sp | P21362 j YCIF#ECOLI (80% identity in 166 amino 
acids) 
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SEQ ID NO: 508 : -0,473626, 274, novel, similar to 
hypothetical proteins for example ,YciG-homolog ! Salmonella 
4275 typhimurium] gi | 6851081 | emh j CAB71036.1 j (88% identity in 
60 amino acids), (also similar to YciG, E. coli [in TONB-TRPA 
I N T E R G E NIC REGION]) 

SEQ ID NO: 509: 0.544262, 62, novel, similar to hypothetical 
proteins for example ,ybcY [Escherichia coli] 

4280 gi | 2495559 j sp j P77460 | YBCY#ECOLI (99% identity in 143 
amino acids) 

SEQ ID NO: 510 : -0.353615, 167, novel, similar to 
hypothetical proteins for example ,YlcE [Escherichia coli] 
gi|3025212isp|P77087|YLCE#ECOLI (98% identity in 61 
4285 amino acids), (similar to orfl94, lambda, phage tail assembly 
protein) 

SEQ ID NO: 511 : -0.336744, 646, novel, similar to 
hypothetical proteins for example .L0013 I Escherichia coli 
0-157:H7 EDL933] gi | 3414881 | gb | aaC31492.1 | (99% identity 
4290 in 133 amino acids) 

SEQ ID NO: 512: 0.348333, 61, novel, similar to hypothetical 
proteins for example ,L0014 [Escherichia coli 0"157:H7 
EDL933] gi | 3414882 | gbaaC31493.1 | (100% identity in 115 
amino acids) 

4295 SEQ ID NO: 513 : -0.398876, 90, novel, similar to 
hypothetical proteins for example , L00 15 ! Escherichia coli 
0-157:H7 EDL933] gi | 3414883 | gb | aaC31494.1 | (100% identity 
in 512 amino acids) 

SEQ ID NO: 514: 0.087324, 72, a putative fimbrial protein 
4300 (partial), similar to truncated BfpA [Escherichia coli] 
gi j 4808944 | gb | aaD30026.1 | AF119170#1 (75% identity in 40 
amino acids) 

SEQ ID NO: 515 : -0.027193, 115, novel, similar to 
hypothetical proteins for example ,[ plasmid F] 
4305 gi j 8918853 | dbj | Baa97900.1 j (76% identity in 492 amino acids) 
SEQ ID NO: 516: "0.440678, 178, an outer membrane protease 
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precursor, similar to outer membrane protease precursors for 
example .protease VII precursor [Escherichia colij 
gi ! 129181 j sp | P09169 j OMPT#ECOLI (98% identity in 317 

43 1 0 amino acids) 

SEQ ID NO: 517 : -0.283069, 190, novel, similar to 
hypothetical proteins for example /putative DNAbinding 
protein I Streptomyces coelicolor A3(2)] 

gi | 6855358 | emb | CAB71249.1 | (34% identity in 171 amino 

4315 acids) 

SEQ ID NO: 518: -0.234839, 156, a transposase, identical to 
hypothetical protein[Escherichia eoli plasmid p 0-157 
insertion sequence IS629] gi j 7444868 j pir j |T0024l 
SEQ ID NO: 519 : 0.076471, 69, a transposase, identical to 
4320 transposase [Escherichia coli plasmid p 0-157 insertion 
sequence IS629] gi I 7443862 j pir j \ T00240 

SEQ ID NO: 520: 0.045946, 75, similar to a part of hypothetical 
proteins, for example, YPJA#ECOLI gi | 2507221 | sp | P52143 
(amino acids at the position 1336-1569/1569) (96% identity in 
4325 234 amino acids), GTG start 

SEQ ID NO: 521: -0.288889,73, novel 

SEQ ID NO: 522 : 1,11087, 47, a transposase (insertion 
sequence IS629), similar to gi | 7443862 | pir | T00240 (96% 
identity in 296 amino acids) 
4330 SEQ ID NO: 523 : -0.714754, 62, a transposase (insertion 
sequence IS629), similar to hypothetical proteins for 
example , [Shigella flexneri SHI-2 pathogenicity island] 
gi j 5532454 j gb j aaD44 738.1 ! AF141323#9 (98% identity in 108 
amino acids) 

4335 SEQ ID NO: 524: -0.468595, 122, a putative TonB dependent 
outer membrane receptor, similar to TonB dependent outer 
membrane receptor PrrA [Escherichia coli CFT073] 

gi | 366147 7 i gb i aaC6 1709.1 j (97% identity in 656 amino acids) 
SEQ ID NO:525: -0.648128, 188, a molybdenum transporter 

4340 protein, similar to molybdenum transporter proteins for 
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example ,gi | 3661478 | gb | aaC61710.1 | (91% identity in 284 
amino acids) 

SEQ ID NO: 526: -0.117179, 554, novel 

SEQ ID NO: 527 : -0.148992, 646, novel, similar to 
4345 hypothetical proteins for example ,Orf2 [Escherichia coli 
C F T 0 73] g i | 3 6 6 1 4 7 9 | gb ! a a C 6 1 7 1 1 . 1 | (98 % i d e n t i t y i n 2 1 4 
amino acids) 

SEQ ID NO: 528 : -0.435414, 834, a putative ferric 
enterobactintransporter, similar to ferric 

43 5 0 enterobactintransporter A TP- bin ding protein [Escherichia coli 
CFT073] gi | 3661480 I gb | aaC61712.1 | (79% identity in 148 
amino acids) 

SEQ ID NO: 529: -0.008333, 109, a putative ABC protein 
(permease), similar to ABC transporter permeases for 
4355 example , [Haemophilus influenzae] 

gi | 2501391 | sp | Q57130 | YE71#HAEIN (40% identity in 323 
amino acids) 

SEQ ID NO: 530: -0.180172, 117, a putative ABC transporter, 
similar to iron (iii) ABC transporter, ATP-binding protein 

4360 [Pyrococcus abyss! (strain Orsay)] gi i 7519847 j pir [ [A75077 
(24% identity in 246 amino acids): hypothetical proteins for 
example , [Methanosarcina barker] gi I 2 129363 [ pir [ | S62196 
(26% identity in 259 amino acids) 
SEQ ID NO: 531 : "0.46554, 149, novel 

4365 SEQ ID NO: 532: 0.172807, 115, a putative integrate, similar 
to phage integrase family, for example , [Bacteriophage 21] 
gij 138558 | sp | P27077 I VINT#BPP21 (50% identity in 370 amino 
acids) 

SEQ ID NO: 533 : -0.333614, 239, a putative excisionase, 
4370 similar to excisionases for example , I Bacteriophage 21] 
gij 139674 | sp | P27079 | VXIS#BPP21 (45% identity in 77 amino 
acids) 

SEQ ID NO: 534: -0.296774, 125, a putative exonuclease, its 
N-terminal part (amino acids at the position 1-256) is similar to 
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4375 hypothetical proteins for example ,ydfE [Escherichia coli 
crypticprophage/truncated insertion sequence IS2 fusion] 
gi j 78597 | pir | | S03698 (92% identity in 256 amino acids); its 
Central part (amino acids at the position 209-622) is similar to 
Exodeoxyribonuclease VIII (EC 3.1.11.-) (Exo VIII). 

4380 [Escherichia coli] gi | 1742216 | dbj | Baa 14950.1 | (39% identity 
in 361 amino acids); its C-terminal part (amino acids at the 
position 644-776) is similar to exonuclease [phage T4] 
gi j 119690 | sp | P04536 | EXOD#BPT4 (27% identity in 133 amino 
acids) 

4385 SEQ ID NO: 535: -0.091398,94, novel, similar to hypothetical 
protein YdfD [Escherichia coli I 

gi | 140587 | sp | P29010 | YDFD#ECOLI (96% identity in 63 amino 
acids) 

SEQ ID NO: 536: -0.238298, 142, a putative cell division 
4390 inhibition protein, similar to cell division inhibitor dicB 

[Escherichia coli] gi I 2507009 | sp | P09557 | DICB#ECOLI (93% 

identity in 62 amino acids) 

SEQ ID NO: 537: -0.317647, 953, novel 

SEQ ID NO: 538: -0.665487,114, novel 
4395 SEQ ID NO: 539 : -0.364655, 233, novel, similar to 

hypothetical 8.3 KD protein YdfC [Escherichia coli] 

gi | 140586 | sp | P21418 | YDFC#ECOLI, (94% identity in 72 amino 

acids) 

SEQ ID NO: 540: -0.672619, 85, a putative repressor protein 
4400 of division inhibition gene dicB, similar to DicA repressor 
protein of division inhibition gene dicB [Escherichia coli] 
gi ! 118631 | sp ! P06966 | DIG A#E COLI (63% identity in 131 amino 
acids); its N-terminal part (amino acids at the position 1-68 
amino acids) is similar to N-terminal part of protein 
4405 [Bacteriophage P22i gi | 133359 | sp | P03035 | RPC2#BPP22(61% 
identity in 68 amino acids) 

SEQ ID NO: 541: -0.47226, 293, a putative repressor protein 
of division inhibition gene dicB, similar to DicC repressor 
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protein of division inhibition gene dicB [Escherichia coli] 
4410 gi j 118633 | sp j P06965 | DICC#ECOLI (82% identity in 74 amino 
acids); its N-terminal part (amino acids at the position 1-57 
amino acids) is similar to (at low level) Cro [Bacteriophage P22] 
gi | 132195 j sp | P09964 i RCRO#BPP22 (36% identityin 57 amino 
acids) 

4415 SEQ ID NO: 542 : -0.389388, 246, novel, similar to 
hypothetical 11.0 kDa protein YdfX [Escherichia coli] 
gi | 3183265 | sp | P76165 | YDFX#ECOLI (87% identity in 93 
amino acids) 

SEQ ID NO: 543: -0.211702, 95, novel, similar to replication 

4420 termination factor (prepriming protein I) DnaT [Escherichia 
coli] gi | 1361001 | pir | | S56589 (51% identity in 83 amino acids) 
SEQ ID NO: 544: -0.145524, 783, a putative phagereplication 
protein, similar to phagereplication proteins for example , 
protein 14 [phage phi-80) gi | 137937 I sp | P14814 | VG14#BPPH8 

4425 (48% identity in 129 amino acids) 

SEQ ID NO: 545: -0.473433, 1134, a putative fimbria! minor 
pilin protein precursor, similar to N-terminal part of fimbrial 
minor pilin protein precursors for example , Pap-related pilus 
H [Escherichia coli] gi | 837337 j gb | aaA67692.1 | (75% identity in 

443 0 56 amino acids), GTG start, probably interrupted by frame shift 
SEQ ID NO: 546: 0.168627, 52, a fimbrial minor pilin protein 
precursor (partial), similar to Oterminal part of fimbrial minor 
pilin protein precursors, for example ,PrsH [Escherichia coli] 
gi | 1172646 | sp | P42185 | PRSH#ECOLI (62% identity in 50 

4435 amino acids) 

SEQ ID NO: 547: 0.350336, 150, a putative colonization factor, 
identical to Anm (attachment and efface ment of negative 
mutant) protein [Escherichia coli] 

gi j 6715555 ! gb | aaB48445.2 | (100% identity in 252 amino 

4440 acids): similar to accessory colonization factor AcfC [Vibrio 
cholerae] gi j 558481 | gb | aaA50604. 1 I (50% identity in 239 
amino acids) 
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SEQ ID NO: 548: -0.544186, 302, a putative toxic protein 
(prophage maintenance; modulation host cell killing), similar to 

4445 Hok/Gef family for example ,Gcf [Escherichia coli] 
gi | 2120017 I pir I | S4054G (79% identity in 69 amino acids) 
SEQ ID NO: 549: -0,409434, 54, novel, similar to Rem protein 
[Escherichia colilgi I 132324 | sp ! P07010 | REM#ECOLI (71% 
identity in 84 amino acids) 

4450 SEQ ID NO: 550: -0.517544, 58, novel, similar to (at low 
level) orf QD1 [Bacteriophage N15] 

gi | 2564084 j gb | aaB81659.1 j (33% identity in 64 amino acids) 
SEQ ID NO: 551: -0.641758, 92, novel, similar to hypothetical 
protein bl560 [Escherichia colij gi | 7466196 | pir | | C64911 (86% 

4455 identity in 347 amino acids); similar to hypothetical protein A 
[phage PI] gi | 732234 | sp | Q06262 | YORA#BPPl (85% identity in 
347 amino acids), GTG start 

SEQ ID NO: 552 : -0.407064, 454, a putative crossover 
junction endodeoxyribonuclease, similar to Gp67 [Bacteriophage 

4460 HK97] gi | 6901639 | gb | aaF3 1142. 1 | (60% identity in 113 amino 
acids); crossover junction endodeoxyribonuclease s for 
example ,Rus [Escherichia coli cryptic lambdoid prophage 
DLP12] gi 12507117 | sp | P40116 | RUS#ECOLI (39% identity in 
115 amino acids) 

4465 SEQ ID NO: 553: -0.475714, 71, novel 

SEQ ID NO: 1213 : -0.410758, 410, novel [hypothetical 
lipoprotein], its C-terminal part is similar to orf2 
[Bacteriophage P27] gi | 8346569 | emb | CAB93762. 1 | (98% 
identity in 63 amino acids), GTG start 

4470 SEQ ID NO: 1214: -0.622581, 63, a putative DNA methylase, 
similar to orf 3 [BacteriophageP27] 

gi j 8346570 | emb | CAB93763.1 | (85% identity in 312 amino 
acids); similar to adenine specific modification methylases for 
example ,Gp52 [phage N15] gi j 7433503 | pir | | T13139 (55% 

4475 identity in 270 amino acids) 

SEQ ID NO: 1215 : -0.359514, 248, novel, similar to 
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hypothetical proteins for example .[Bacteriophage 93 3 W] 
gi j 4585419 j gb j aaD25447.1 |AF125520#42 (52% identity in 613 
amino acids) 

4480 SEQ ID NO: 1216: 0.293846, 66, a putative holin protein, 
similar to holin proteins for example .[Bacteriophage 933W] 
gi | 4499808 | emb | CAB39307.1 I (95% identity in 71 amino acids) 
SEQ ID NO: 1217 : 0.377049, 62, novel, similar to 
hypothetical protein YdfR [Escherichia coli] 

4485 gi | 3183262 | sp | P76160 | YI)FR#ECOLI (45% identity in 74 
amino acids) 

SEQ ID NO: 1218 : -0.180952, 64, a putative endolysin, 
similar to endolysins for example , [Bacteriophage 933W.I 
gi | 4585422 | gb | aaD25450.1 | AF125520#45 (96% identity in 177 

4490 amino acids) 

SEQ ID NO: 1219 : -0.23625, 81, a putative antirepressor 
protein, identical to putative antirepressor protein Ant 
[Bacteriophage 933W] gi | 4585423 | gb | aaD25451.1 | AF125520; 
similar to antirepressor protein Ant [Bacteriophage 

4495 P22]gi | 131843 I sp | P03037 | RANT#BPP22 (49% identity in 126 
amino acids) 

SEQ ID NO: 1220 -0.936364, 100, endopeptidase (host lysis), 
identical to Rz [Bacteriophage VT2-Sa] 

gi I 5881639 | dbj ! Baa84330.1 | ; similar to Rz endopeptidases for 
4500 example , [Bacteriophage lambda] 

gi | 119368 | sp | P00726 | ENPP#LAMBD (69% identity in 153 
amino acids) 

SEQ ID NO: 1221: -0.548598, 322, a lipoprotein Rzl precursor, 
similar to Rzl protein precursors for 

4505 example , [Bacteriophage 933W] 

gi j 4585425 | gb | aaD25453.1 | AF125520#48(98% identity in 61 
amino acids); [Bacteriophage lambda] 

gi ! 540738 | pir | |JN0750(70% identity in 61 amino acids) 
SEQ ID NO: 1222 : -0.179452, 74, novel, similar to 

4510 hypothetical proteins for example ,[ Escherichia colij 
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gi I 1778472 | gb i aaB40755.1 | (70% identity in 67 amino acids) 
SEQ ID NO: 1223: -0.636194, 269, a putative DNase, similar 
to (at low level) putative DNAse [Bacteriophagephi-C3l] 
gi | 1107475 | emb | Caa62587.1 | (28% identity in 85 amino acids) 

4515 SEQ ID NO: 1224: 0.322807, 115, novel 

SEQ ID NO: 1225: -0.454217, 84, a putative terminase small 
subunit, similar to (at low level) putative terminase small 
subunit [Bacillus subtilis PBSX phage] 

gi | 1722886 | sp | P39785 | XTMA#BACSU (42% identity in 57 

45 2 0 amino acids), GTG start 

SEQ ID NO: 1226: -0.484559, 137, a putative terminase large 
subunit, similar to phage D3terminase-like protein 
I. Haemophilus influenzae] 
gi | 6739656 | gb | aaF27357.1 | AF198256#11 (22% identity in 472 

4525 amino acids) 

SEQ ID NO: 1227 : -0.942222, 91, a putative head 
protein/prohead protease, its N- terminal part is similar to 
p u t a t i v e p r o h e a d p r o t e a s e s fo r e x a m pie , [ B a c t e r i o p h a g e 
HK97] gi | 1722780 | sp | P49860 | VP4#BPHK7 (28% identity in 

4530 136 amino acids); its Oterminal part is similar to major head 
protein [mycobacterium phage L5] 

gi | 465114 | sp | Q05223 | VG17#BPML5 (23% identity in 280 
amino acids), GTG start 

SEQ ID NO: 1228: "0.382433, 75, novel 

4535 SEQ ID NO: 1229: -0.597662, 386, a putative portal protein, 
its N-terminahhalf part is similar to head portal proteins, 
for example , [Bacteriophage HK022] 

gi ! 6863114 | gb 1 aaF30355.1 | AF069308#3 (26% identity in 351 
amino acids); its Oterminal-half part is similar to 

4540 Oterminal-half part of putative transducer protein [H. 

salinarum] gi j 3913878 | sp | Q48317 | HTR4#H ALS A(2 1 % identity 
in 347 amino acids) 

SEQ ID NO: 1230: -0.524865, 186, novel 

SEQ ID NO: 1231 : -0.486352, 404, a putative head-tail 
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4545 adaptor, similar to putative head-tail adaptors for 
example , [Bacteriophage HK97] gi | 6901597 | gb | aaF31100.1 j 
(45% identity in 111 amino acids) 

SEQ ID NO: 1232: -0.194643, 113, novel, similar to phage 
hypothetical proteins for example ,GplO [Bacteriophage II K 9 7 ] 
4550 gi | 6901598 | gb | aaF31101.1 | (75% identity in 148 amino acids) 
SEQ ID NO: 1233: 0.009184, 99, novel, similar to Gpll 
[Bacteriophage HK97] gi | 6901599 I gb | aaF31 102.1 | (49% 
identity in 113 amino acids) 

SEQ ID NO: 1234 : -1.106849, 147, a putative major tail 
4555 subunit, similar to major tail subunit [Bacteriophage HK97] 
gi | 6901588 I gb | aaF31091.1 | AF069529#4 (65% identity in 234 
amino acids), GTG start 

SEQ ID NO: 12 35: -1.563158, 58, a putative tail assembly 
chaperone, similar to putative tailassembly chaperons for 

4560 example ,pl4 [Bacteriophage HK97] 

gi | 6901600 | gb ! aaF31103.1 | (62% identity in 124 amino acids) 
SEQ ID NO: 1236 : -0.692373, 119, novel, similar to 
Oterminal part of Gp 14 [Bacteriophage HK97] 
gi I 6901601 i gb I aaF31104.1 j (60% identity in 94 amino acids), 

4565 probablyprod uced by translational frameshift 

SEQ ID NO: 1237: -0.32604, 554, a putative tail length tape 
measure protein, similar to tail length tape measure 
proteins for example , I Bacteriophage HK97 j 

gi | 6901589 | gb | aaF31092.1 | AF069529#5 (52% identity in 1022 

4570 amino acids) 

SEQ ID NO: 1238 : -0.727957, 94, a putative minor tail 
protein, similar to minor tail proteins for example ,GpM 
[Bacteriophage lambda] gi j 138845 | sp | P03737 | V M T M # L A M B D 
(44% identity in 110 amino acids), GTG start 

4575 SEQ ID NO: 1239 : -0.284615, 92, a putative minor tail 
protein, similar to minor tail proteins for example ,GpL 
[Bacteriophage lambda] gi j 138844 j sp | P03738 | V M T L # L A M B D 
(72% identity in 137 amino acids) 
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SEQ ID NO: 631: -0,709473, 381, a putative host specificity 
4580 protein, similar to host specificity proteins for example ,GpJ 
[Bacteriophage lambda] gi | 138412 | sp j P03749 i VHS J#LAMBD 
(69% identity in 1157 amino acids) 

SEQ ID NO: 632: -0.351282, 79, a putative outer membrane 
protein precursor, similar to outer membrane protein Lorn 
4585 precursors for example , [prophage P-EibA] 

gi j 7532789 | gb | aaF63231.1 | AF151091#2(77% identity in 199 
amino acids); [Bacteriophage lambda] 

gi | 138693 | sp | P03701 | VLOM#LAMBD (40% identity in 199 
amino acids) 

4590 SEQ ID NO: 633 : -0.545985, 275, a putative tail fiber 
protein, similar to tail fiber proteins for 
example , [Bacteriophage 933W] 

gi | 4585436 | gb | aaD25464.1 | AF125520#59 (38% identity in 370 
amino acids) 

4595 SEQ ID NO: 634 : -0.471244, 234, novel, similar to 
hypothetical protein [Bacteriophage 93 3 W] 

gi | 4585437 | gb | aaD25465.1 | AF125520#60 (92% identity in 89 
amino acids) 

SEQ ID NO: 635: -0.194, 101, novel 
4600 SEQ ID NO: 636: 1.042727, 111, novel, similar to hypothetical 
proteins for example ,Orf2 [Escherichia coli strain B171-8] 
gi | 4126792 j dbj j Baa36750.1 | (37% identity in 111 amino acids) 
SEQ ID NO: 637 : -0.138976,509, novel 

SEQ ID NO: 638 : -0.319205, 152, an integrase, similar to 
4605 integrases, for example, [Bacteriophage HK022J 
gi I 138560 j sp | P16407 I VINT#BPHK0 (89% identity in 229 
amino acids), maybe comprising the deletion of 100 amino acids 
at N-terminus 

SEQ ID NO: 639: "0.625, 57, novel 
4610 SEQ ID NO: 64 0 : -0.083333,97, novel 

SEQ ID NO: 641 : -0.538333, 121, disrupted transposase, 
similar to Oterminal of putative transposases for 
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example , [Yersinia pestis plasmid pMTlJ 

gi I 2996347 | gb | aaC 1322 7.1 j (74% identity in 89 amino acids), 

4615 TTG start 

SEQ ID NO: 642 : -0.450655, 230, a disrupted transposase, 
similar to C-terminal part of putative transposases, for example, 
[Yersinia pestis plasmid pMTl] gi j 7447905 | pir | j T14710 (70% 
identity in 90 amino acids), comprising the deletion of 

4620 N- terminal part (-180 amino acids) 

SEQ ID NO: 643: 0.76381, 106, novel, identical to LOO 15 
[Escherichia coli 0-157:H7 strain EDL933] 

gi | 3414883 | gb i aaC31494.1 | 

SEQ ID NO: 644: -0.675317, 159, novel, identical to L0014 
4625 [Escherichia coli 0-157:H7 strain EDL933] 

gi | 3288157 i emb | Caall510.1 | 

SEQ ID NO: 645: -0.396079, 154, novel, identical to LOO 13 
[Escherichia coli 0-157:117 strain EDL933] 

gi | 3414881 | gb ! aaC31492.1 j 

4630 SEQ ID NO: 646: 0.016667, 61, novel 

SEQ ID NO: 647: 0.228866, 98, novel, similar to (at low level) 
hypothetical protein [insertion sequence I S 6 3 0 1 

gi | 140943 | spP16943 | YIS5#SHISO (47% identity in 25 amino 
acids), TTG start, 

4635 SEQ ID NO: 648: -0.455333, 151, novel 

SEQ ID NO: 649: -0.113235, 69, novel, similar to hypothetical 
proteins for example ,orf'2 [Escherichia coli strain B171-8] 
gi | 4126790 | dbj | Baa36748.1 I , (63% identity in 206 amino 
acids) 

4640 SEQ ID NO: 650: -1.015625, 65, bfpT-regulated chaperone- like 
protein, similar to TrcA (bfpT-r for example ,ulated 
chaperone-like protein)-like proteins for 

example ,TrcA[ Escherichia coli strain B171-8] 

gi j 4126789 | dbj | Baa36747.1 j , (72% identity in 195 amino 

4645 acids) 

SEQ ID NO: 651: -0.513812, 182, novel, partially similar to 
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hypothetical protein [insertion sequence IS630] 

gi j 140943 | spP16943 | YIS5#SHISO (81% identity in 60 amino 
acids), GTG start, probably disrupted 

4650 SEQ ID NO: 652: -0.585648, 642, novel, similar to N-terminal 
part of hypothetical 39 kl)a protein [insertion element I S 6 3 0 ] 
gi I 1143207 | gb | aaA84873.1 I (82% identity in 54 amino acids) 
SEQ ID NO: 653: -0.526471, 69, novel, similar to hypothetical 
protein ORF2 [Escherichia coli strain B171-8] 

4655 gi | 412 6790 | dbj | Baa 36748. 1 | (38% identity in 167 amino 
acids); ORF4 [Escherichia coli strain B171-8] 
gi | 4126792 | dbj | Baa36750.1 I (40% identity in 127 amino acids) 
SEQ ID NO: 654: -0.431519, 534, a putative transcription 
regulatory protein, similar to transcription regulatory proteins 

4660 for example ,UidR [Escherichia coli] 

gi | 2495429 | sp I Q59431 I UIDR#ECOLI (30% identity in 123 
amino acids) 

SEQ ID NO: 655 : "0.048747, 440, a putative 

m u 1 1 i d r u g - e f f I uxtransp o r t e r p r o t e i n p r e cursor, si m i I a r t o 
4665 multidrug-efflux transporter protein precursors for 
example ,AcrA [Escherichia coli K- 12] 

gi | 399000 | sp | P31223 | ACRA#ECOLI (51% identity in 358 
amino acids) 

SEQ ID NO: 656 : -0.159091, 111, a putative 

4670 multidrug-effluxtransporter protein, similar to 

multidrug- effluxtransporter proteins for example , AcrB 
[Escherichia coli K-12] gi | 399001 | sp | P31224 | ACRB#ECOLI 
(56% identity in 974 amino acids) 

SEQ ID NO: 657: -0.38651, 342, a putative outer membrane 
467 5 channel protein, similar to outer membrane channel proteins 
for example ,OprM [Pseudomonas aeruginosa] 

gi | 3184190 j dbj | Baa28694.1 j (43% identity in 448 amino acids) 
SEQ ID NO: 658 : -0.231818, 133, a putative membrane 
transporter protein, similar to membrane transporter protein 
4680 for example , [Streptomyces coelicolor A3(2)J 
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gi I 6469269 j emb j CAB61730.1 I (38% identity in 380 amino 
acids) 

SEQ ID NO: 659 : -0.434188, 118, novel, similar to 
hypothetical protein [Xylella fastidiosa] 

gi | 9106817 | gb | aaF84556.1 | AE003997#12 (38% identity in 209 
amino acids) 

SEQ ID NO: 660: -0.471354, 193, similar to C-terminal part of 
B1327#ECQLI gi j 1 787587(amino acids at the position 
224-31.0/31.0) (33% identity in 87 amino acids) 

SEQ ID NO: 661: -0.156489, 132, similar to N-terminal part of 
B1327#ECOLI gi | 1 787587(amino acids at the position 
22-123/310) (62% identity in 113amino acid) 

SEQ ID NO: 662 : -0.247561, 247, a transposase (insertion 

sequence IS629), identical to gi | 7443862 | pir | | T00240 

SEQ ID NO : 663 : -0.355, 141, a transposase (insertion 

sequence IS629), identical to gi | 7444868 | pir | i T00241 

SEQ ID NO: 664: -0.182639, 145, a putative regulatory 

element, similar to (at low level) regulatory proteins for 

example , regulatory protein CI (235 amino acids) 

[Bacteriophage HK022] gi | 1350835 | sp | P18680 (42% identity in 

66 amino acids) 

SEQ ID NO: 665 : -0.463487, 850, a putative regulatory 
element, similar to Cro [Bacteriophage HK022] 
gi | 1350553 | sp j P18679 (61% identity in 73 amino acids) 
SEQ ID NO: 666: -0.314679, 110, its C-terminal part (amino 
acids at the position 139-262 / 262) is similar to C-terminal 
part of YDAU#ECOLI gi 11787622 (amino acids at the position 
162-285 / 285) (79% identity in 124 amino acids) 



SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 



667: -0.4625, 233, novel 
668: -0.390688, 248, novel 
669: 0.20583, 224, novel 
670: -0.342491, 1133, novel 

671: -0.326633, 200, novel, similar to N-terminal 



part of Eamiao acid protein [Bacteriophage P22j 
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4715 gi j 418207 | sp | Q03544 | VEaa#BPP22(88% identity in 42 amino 
acids) 

SEQ ID NO- 672: -0.27899, 972, novel, partially similar to 
hypothetical protein [Bacteriophage H19J] 

gi j 4490348 j emb j CAB38711.1 | (70% identity in 54 amino 

4720 acids); partially similar to part of Gp45 [Bacteriophage N15] 
gi | 7521552 j pir j IT13131 (57% identity in 47 amino acids) 
SEQ ID NO: 673: -0.308808, 194, possible methyltransferase , 
similar to methyltransferases for example ,cyto sine- specific 
m e t h y 1 1 r a n s f e r a s e X o r 1 1 [ X a n t h o m onas o r y z a e p v . ] 

4725 gi | 1709171 | sp | P52311 | MTX2#XANOR (40% identity in 365 
amino acids) 

SEQ ID NO: 674: -0.40473, 297, novel, similar to (at low 
level) hypothetical protein HI0983 [Haemophilu s influenzae] 
gi | 1074592 ! pir ! | D64163 (26% identity in 138 amino acids) 
4730 SEQ ID NO: 675: -0.432143, 169, novel 

SEQ ID NO: 676: -0.448193, 167, novel, similar to Orf79 
[Bacteriophage D3] gi | 8895177 | gb | aaF80835.1 | (36% identity 
in 199 amino acids) 

SEQ ID NO: 677: -1.706667, 61, novel, similar to hypothetical 
473 5 proteins for example ,YbcO [Escherichia coli cryptic prophage 
I) LP 12] gi | 7467043 | pir | | C64787 (57% identity in 96 amino 
acids); Gp66 [Bacteriophage HK97] 

gi | 6901638 ! gb | aaF31141.1 j (56% identity in 94 amino acids) 
SEQ ID NO: 678: -0.237063, 144, a putative aniterminator, 
4740 similar to (at low level) antiterminator proteinQ [Bacteriophage 
21] gi | 4539484 | emb | CAB39993.1 | (22% identity in 168 amino 
acids) 

SEQ ID NO: 679: -0.446341, 83, novel, similar to putative 
TerB proteins for example , [Deinococcus radiodurans] 
4745 gi | 7473690 | pir | | C75302 (26% identity in 129 amino acids) 
SEQ ID NO: 680: -0.403175, 127, novel, GTG start 
SEQ ID NO: 681 : 0.010435, 116, novel, similar to 
hypothetical proteins for example , [Bacteriophage 933WJ 
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gi j 4585419 | gb | aaD2 5447,1 | AF125520#42 (53% identity in 613 
4750 amino acids) 

SEQ ID NO: 682 : -0.445312, 513, a putative holin protein 

(host cell lysis), similar to holin proteins for 

example , [Bacteriophage 933W] gi | 4499808 | emb I CAB39307. 1 j 

(91% identity in 71 amino acids) 
4755 SEQ ID NO: 683: -0.57037, 55, novel, similar to hypothetical 

protein [Escherichia coli] gi | 3183262 i sp j P76160 j YDFR#ECOLI 

(43% identity in 74 amino acids) 

SEQ ID NO: 684: -0.313158, 495, a endolysin (host cell lysis), 
similar to endolysins for example , [Bacteriophage 933W] 

4760 gi | 4335686 I gb | aaD17382.1 | (96% identity in 177 amino acids) 
SEQ ID NO: 685: -0.652681, 318, a putative antirepressor, 
identical to putative antirepressor [Bacteriophage 933 W] 
gi I 4585423 j gb j aaD2 545 1,1 j AF125520#46 (100% identity in 
189 amino acids): its N-terminal part (amino acids at the 

4765 position 1-126) is similar to antirepressor protein Ant 
[Bacteriophage P22] gi | 131843 | sp | P03037 | RANT#BPP22 (49% 
identity in 126 amino acids) 

SEQ ID NO: 686: -0.24433, 195, an endopeptida se (host cell 
lysis), similar to endopeptidases for example , [Bacteriophage 
4770 VT2-Sa] gi | 5881639 | dbj | Baa84330.1 | (100% identity in 155 
amino acids) 

SEQ ID NO: 687 : -0.965741, 109, novel, similar to 
hypothetical protein [Escherichia coli] 

gi j 1778472 | gb i aaB40755.1 | (70% identity in 67 amino acids); 

4775 hypothetical protein [Salmonella dublinj 

gi ! 3511132 j gb | aaC33722.1 j (70% identity in 49 amino acids) 
SEQ ID NO: 688: -0.397973, 297, a putative DNase, similar to 
(at low level) gp30 (DNase) [Bacteriophagephi- C3 1] 
gi j 1107475 | emb | Caa62587.1 | (28% identity in 85 amino acids); 

4780 similar to (at low level) TerF-related protein [Deinococcus 
radiodurans] gi I 7473956 j pir j | C75599 (33% identity in 72 
amino acids) 
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SEQ ID NO: - : -0.413248, 235, novel 

SEQ ID NO: - : 0.280303, 67, a putative terminase small 
4785 sub unit, similar to phage terminase small subunits for 
example , [Bacillus subtilis PBSX] 

gi | 1722886 | sp | P39785 | XTMA#BACSU (34% identity in 52 
amino acids) 

SEQ ID NO: 1641: -0.383784, 297, a putative terminase large 
4790 subunit, similar to phage hypothetical proteins, for 
example .phage D3 terminase-] ike protein [Haemophilus 
influenzae] gi I 6739656 | gb | aaF27357.1 | AF1 98256#11 (22% 
identity in 57 amino acids) 

SEQ ID NO: 1642 : -0.942593, 109, a phage major head 
4795 protein/prohead protease, its Oterminal part is similar to 
m a j o r head proteins f o r e x a m p I e , [ M y c o b a c t e r i u m p h a g e L 5 ] 
gi | 465114 | sp | Q05223 | VG17#BPML5 (22% identity in 306 
amino acids); its N-terminal part is similar to putative 
prohead proteases for example , [Rhodobacter capsulatus] 
4800 gi | 6467535 | gb | aaFl3181.1 | AF181080#3 (30% identity in 133 
amino acids); similar to putative prohead protease 

[Rhodobacter capsulatus] 
gi | 6467535 | gb | a a F 13181.1 | A F 1 8 1 0 80#3 (30% identity in 133 
amino acids), GTG start 
4805 SEQ ID NO: - : -0.615396, 657, novel 

SEQ ID NO: 1419: 0.067253, 285, a putative portal protein, 
similar to phage portal proteins for example , [Bacteriophage 
D3] gi | 5059250 | gb j aaD38955.1 | (24% identity in 366 amino 
acids) 

4810 SEQ ID NO: 1420 : -0.121505,94, novel 
SEQ ID NO: 1421 : -0.211215, 215, novel 

SEQ ID NO: 1422: 0.150397, 253, a putative phage head-tail 
adaptor, similar to head-tail adaptors for 
example , [Bacteriophage HK97] gi | 6901597 | gb | aaF31100.1 j 
4815 (44% identity in 111 amino acids) 

SEQ ID NO: 1423 : 0.99049, 327, novel, similar to phage 
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hypothetical proteins for example ,GplO [Bacteriophage HK97] 
gi | 6901598 | gb j aaF31101.1 I (75% identity in 148 amino acids) 
SEQ ID NO: 1424: -0.024118, 341, novel, similar to Gpll. 
4820 [Bacteriophage HK97] gi | 6901599 | gb | aaF31102.1 | (49% 
identity in 113 amino acids) 

SEQ ID NO: 1425: 0.580303, 67, a major tail subunit, similar 
to major tail subunit [Bacteriophage HK97] 
gi | 6901588 | gb | aaF31091.1 | AF069529#4 (67% identity in 234 

4825 amino acids) 

SEQ ID NO: 338: -0.622872, 377, a putative tail assembly 
chaperon, similar to tail assembly chaperon Gpl4 
[Bacteriophage HKJ97] gi | 6901600 | gb | aaF31103.1 | (62% 
identity in 124 amino acids) 

4830 SEQ ID NO: 339: -0.239024, 83, novel, similar to Oterminal 
part of G p 1 4 [Bacteriophage HK9 7] 

gi | 6901601 j gb j aaF31104.1 | (60% identity in 94 amino 
acids), probably produced by translational frameshift 
SEQ ID NO: 340: -0.7548, 824, a putative tail length tape 

483 5 measure protein, similar to tail length tape measure 
proteins for example , [Bacteriophage HK97.I 

gi | 6901589 | gb j a a F 31092.1 | AF06 952 9# 5 (52% identity in 1022 
amino acids) 

SEQ ID NO: 341 : 0.230159, 64, a putative tail component, 
4840 similar to minor tail proteins for example , GpM 
[Bacteriophage lambda] gi | 138845 j sp | P03737 | V M T M # L A M B D 
(45% identity in 110 amino acids), GTG start 

SEQ ID NO: 342: -0.180645, 63, a putative tail component, 
similar to minor tail proteins for example ,GpL 
4845 [Bacteriophage lambda] gi | 138844 | sp ! P03738 \ VMTL#LAMBD 
(75% identity in 2 32 amino acids) 

SEQ ID NO: 343: -0.133766, 78, a putative tail assembly, 
similar to tail assembly proteins for example ,GpK 
[Bacteriophage lambda] gi | 6901605 | gb | aaF31108.1 | (35% 
485 0 identity in 22 6 amino acids) 
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SEQ ID NO: 344: -0.166667, 136, a putative tail assembly, 
similar to tail assembly proteins for example , Gpl 
[Bacteriophage lambda] gi j 139637 ! sp 1 P03730 j VTAI#LAMBI) 
(69% identity in 224 amino acids) 

4855 SEQ ID NO: 345: -0.626389, 73, novel 

SEQ ID NO: 346 : -0.679259, 136, a putative superoxide 
dismutase. similar to copper/zinc-superoxide dismutases for 
example , I Salmonella typhimurium] 

gi | 2462899 j emb j Caa 73588.1 | (58% identity in 175 amino 

4860 acids) 

SEQ ID NO: 347 : -0.498667, 76, a putative phage host 
specificity protein, similar to host specificity proteins for 
example ,Gp J [Bacteriophage lambda] 

gi | 138412 j sp | P03749 j VHSJ#LAMBD (70% identity in 1164 
4865 amino acids) 

SEQ ID NO: 348: -0.345355, 184, similar to outer membrane 
proteins for example ,Lora protein [Bacteriophage P-EibA] 
dadjAFl 5 1091-21 a a F 6 3 2 3 1 . 1 I ( 6 8 % i d e n t i t y i n 1 9 9 a m i n o 
acids) 

4870 SEQ ID NO: 349 : -0.672832, 347, a putative tail fiber 
protein, similar to putative tail fiber proteins for 
example , [Bacteriophage 933W] 

gi | 4585436 j gb j aaD2 5464.1 j AF125520#59 (AF125520) (34% 
identity in 233 amino acids). GTG start 

4875 SEQ ID NO: 350 : -0.670588, 222, novel, similar to 
hypothetical protein [Bacteriophage 93 3 W] 

gi | 458543 7 I gb | aaD25465.1 I AF 12552 0#60 (94% identity in 129 
amino acids), GTG start 

SEQ ID NO: 351 : -0.268932, 104, novel, similar to ORF4 
4880 [Escherichia coli strain B171-8] gi | 4126792 | dbj | Baa36750.1 | 
(35% identity in 116 amino acids); ORF2 [Escherichia coli 
strain Bl 71-81 gi j 41 26790 | dbj | Baa36748. 1 ! (28% identity in 
171 amino acids) 

SEQ ID NO: 352 : -0.120755, 54, novel, similar to QRF4 



Appendix B: Hideo et at. Full Translation 

4885 [Escherichia coli strain B171-8] gi ! 4126792 | dbj | Baa36750.1 ! 

(91% identity in 135 amino acids); ORF2 [Escherichia coli 
strain B171-8] gi | 4126790 | dbj | Bamino acid36748.1 | (43% 
identity in 205 amino acids) 

SEQ ID NO: 353 : -0.368651, 253, novel, similar to ORF4 
4890 [Escherichia coli B171-8] gi 14126792 | dbj |Baa36750.1 | (41% 
identity in 135 amino acids); ORF2 [Escherichia coli strain 
B171-8] gi ! 4126790 | dbj | Baa 36748.1 | (36% identity in 126 
amino acids) 

SEQ ID NO: 354 : 0.292857, 71, similar to YDBL#ECOLI 
4895 gi 11787648 (71% identity in 109 amino acids), but comprising 
different N-terminal part and C-terminal part 

SEQ ID NO: 355 : 0.012941, 86, a putative ABC-type 
transporter protein, similar to N-terminal part of ABC- type 
transporter protein YdbA.2 [Escherichia coli] 

4900 gi | 7465766 j pir j | C48399 (amino acids at the position 1-1128 / 
2020) (49% identity in 1011 amino acids) 

SEQ ID NO: 356 : -1.156522, 93, a putative ABC-type 
transporter protein, similar to C-terminal part of ABC-type 
transporter protein YdbA.2 [Escherichia coli] 

4905 gi | 7465766 | pir | | C48399(amino acids at the position 
1220-2020/2020) (77% identity in 806 amino acids) 
SEQ ID NO: 357: -0.396839, 349, novel 
SEQ ID NO: 358: "0.287395, 120, novel 
SEQ ID NO: 359: -0.428409, 177, novel 

4910 SEQ ID NO: 360 : 0.049057, 107, novel 

SEQ ID NO: 361 : -0.469602, 353, novel, similar to Vgr 
proteins for example ,VgrE protein [Escherichia coli] 
gi j 2920625 | gb | aaC32465.1 j (98% identity in 702 amino acids) 
SEQ ID NO: 362: -0.206969, 618, a Rhs protein, similar to 

4915 Rhs core proteins for example ,RhsD [Escherichia coli] 
gi ! 1786706 (92% identity in 1281 amino acids) (Conserved in 
E.coli K-12) 

SEQ ID NO: 363: 0.095775, 72, novel, similar to (at low level) 
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IpaH protein IPA7#SHIFL gi | 124813 | sp | P18014 (35% identity 
4 92 0 in 120 a mi no a ci d s) ; YD D K#E CO LI gi 131832 5 8 | sp | P 7612 3 (32 % 
identity in 100 amino acids) 

SEQ ID NO: 364: -0.074561, 115, similar to outer membrane 
porin precursors for example ,NMPC#ECOLI gi | 1786765 (67% 
identity in 343 amino acids), but comprising different 

4925 N-terminal part 

SEQ ID NO: 365: -0.466667, 178, novel, GTG start 
SEQ ID NO: 366 : -0.283069, 190, a putative fimbria! 
chaperon e protein precursor, similar to fimbria! chape rone 
protein precursors for example ,FocC [Escherichia coli] 

4930 gi | 1169720 | sp | P46008 | FOCC#ECOLI (67% identity 206 amino 
acids) 

SEQ ID NO: 367: -0.472903, 156, a putative type 1 fimbrial 
protein precursor, similar to type 1 fimbrial protein precursors 
for example .[Escherichia coli] 

4935 gi | 729528 | sp | P04128 | FM1A#EC0LI (64% identity 186 amino 
acids) 

SEQ ID NO: 368 : 0.214754,62, novel, GTG start 
SEQ ID NO: 369: -0.717334, 76, a putative regulatory element, 
similar to araC-family transcription regulatory elementAdpA 
4940 [Streptomyces coelicolor A3(2)] gi | 7544056 | emb | CAB87229.1 
(41% identity in 316 amino acids) 

SEQ ID NO: 370: -0.468595, 122, a damage-inducible protein, 
similar to damage-inducible proteins for example .Dial 
[Escherichia colij gi | 2498305 | sp I Q47143 I DINI#ECOLI (36% 

4945 identity in 72 amino acids) 

SEQ ID NO: 371 : -1.029787, 48, novel, similar to hypothetical 
proteins for example ,ORF4 [Escherichia coli] 

gi j 4126792 | dbj | Baa36750.1 | (43% identity in 131 amino 
acids); ORF2 [Escherichia coli] gi | 4126790 | dbj | Baa36748.1 | 

4950 (35% identity in 126 amino acids) 

SEQ ID NO: 372 : -0.648128, 188, novel, similar to 
hypothetical proteins for example ,ORF4 [Escherichia colij 
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gi ! 4126790 j dbj ! Baa36748.1 I (43% identity in 206 amino 
acids); ORF2 [Escherichia coli] gi j 4126792 | dbj | Baa36750.1 j 
495 5 (91% identity in 135 amino acids) 

SEQ ID NO: 373 : -0.117179, 554, novel, similar to 
hypothetical proteins for example ,ORF4 I Escherichia colli 
gi ! 4126792 j dbj ! Baa3675().l | (34% identity in 116 amino 
4960 acids); ORF2 [Escherichia coli] gi | 4126790 | dbj | Baa36748.1 | 
(28% identity in 171 amino acids} ] 

SEQ ID NO: 374 : -0.148992, 646, novel, similar to 
hypothetical protein [Bacteriophage 933W] 

gi [ 458543 7 j gb j aaD25465.1 j AF 12552 0#60 (93% identity in 89 

4965 amino acids) 

SEQ ID NO: 375: -0.831147, 62, a putative tail fiber protein, 
similar to Oterminal part of putative tail fiber protein 
[Bacteriophage 933W] 
gi | 4585436 | gb | aal)25464.1 | AF125520#59 (1.00% identity in 92 

4970 amino acids), GTG start, probably disrupted 

SEQ ID NO: 376: -0.483469, 860, a putative tail fiber protein, 
similar to N-terminal part of tail fiber proteins for 
example ,Gp37 [Escherichia coli] gi | 7466858 | pir | [064887(57% 
identity in 271 amino acids); orf-40.1 [Bacteriophage lambda] 

4975 gi | 140053 | sp | P03764 | Y401#LAMBD (56% identity in 269 
amino acids), probably interrupted 

SEQ ID NO: 377 : -0.061111, 109, a putative outer host 
membrane protein precursor, similar to Lom-like proteins for 
example , [prophage P-EibAj 

4980 gi | 7532789 | gb | aaF63231.1 | AF151091#2 (68% identity in 199 
amino acids); Lorn [Bacteriophage lambda] 

gi [ 138693 | sp | P03701 | VLOM#LAMBD (44% identity in 199 
amino acids) 

SEQ ID NO: 378: -0.192241, 117, a phage tail protein (host 
4985 specificity protein), similar to host specificity proteins for 
example ,GpJ [Bacteriophage 
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lambda] gi ! 138412 j sp | P03749 | VHS J#LAMBD (65% identity in 
1158 amino acids) 

SEQ ID NO: 379 : -0.512838, 149, a tail assembly protein, 
4990 similar to tail assembly proteins for example ,GpI 
[Bacteriophage lambda] gi j 139637 | sp | P03730 j VTAI#LAMBD 
(69% identity in 224 amino acids) 

SEQ ID NO: 380 : 0.172807, 115, a tail assembly protein, 
similar to tail assembly proteins for example ,GpK 
4995 [Bacteriophage lambda gi I 139638 j sp | P03729 j VTAK#LAMBD 
(85% identity in 174 amino acids) 

SEQ ID NO: 381 : -0.337367, 282, a minor tail component, 
similar to minor tail proteins for example ,GpL 
[Bacteriophage lambda] gi i 138844 j sp | P03738 I V M T L # L A M B D 

5000 (76% identity in 232 amino acids) 

SEQ ID NO: 382 : -0.296774, 125, a minor tail component, 
similar to minor tail proteins for example ,GpM 

[Bacteriophage lambda] gi | 138845 j sp j P03737 I V M T M # L A M B D 
(79% identity in .1.09 amino acids) 

5005 SEQ ID NO: 383: -0.091398, 94, a tail length determination, 
similar to tail length tape measure protein precursors for 
example ,GpH [Bacteriophage lambda] 

gi | 138843 | sp | P03736 | VMTH#LAMBD (51% identity in 870 
amino acids) 

5010 SEQ ID NO: 384: -0.319298, 1027, a minor tail component, 
similar to minor tail proteins for example ,GpG-T 
[Bacteriophage lambda] gi I 7429179 | pir | | TLBPTL (67% 
identity in 134 amino acids), produced by translational 
frame shift 

5015 SEQ ID NO: 385: -0.624779, 114, a minor tail component, 
similar to minor tail protein s for example ,GpG [Bacteriophage 
lambda] gi | 138842 | sp | P03734 | VMTG#LAMBD (43% identity in 
.140 amino acids) 

SEQ ID NO: 386 : -0.477931, 146, novel, probably 
5020 corresponding to protein V [Bacteriophage lambda] 
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SEQ ID NO: 387: -0,276079, 1159, a minor tail component, 
similar to minor tail protein GpU [ B a c t e r 1 o p b. a g e I a ra b d a ! 
gij 138847 | sp | P03732 | VMTU#LAMBD (80% identity in 131 
amino acids) 

5025 SEQ ID NO: 388 : -0.29799, 200, a minor tail component, 
similar to minor tail proteins for example , GpZ [Bacteriophage 
lambda] gi | 138849 | sp | P03731 I VMTZ#LAMBD (69% identity in 
177 amino acids) 

SEQ ID NO: 389: -0.661327, 438, a tail attachment (minor 
5030 capsid protein), similar to minor capsid proteins for 
example ,GpFII [Bacteriophage lambda] 

gi | 137575 | sp | P03714 | VCF2#LAMBD (91% identity in 117 
amino acids) 

SEQ ID NO: 390 : -0.392135, 90, DNA-packaging, similar to 
5035 DNA-packaging proteins for example ,GpFI [Bacteriophage 
lambda] gi I 139324 | sp | P03709 j VPF1#LAMBD (98% identity in 
132 amino acids) 

SEQ ID NO: 391: 0.522727, 89, a major capsid protein, similar 
to major capsid proteins for example ,GpE [Bacteriophage 
5040 lambda] gi | 116701 | sp | P05481 | HEAD#BPPH8 (87% identity in 
341 amino acids) 

SEQ ID NO: 392: -0.269369, 112, a head decoration protein 
(major capsid protein), similar to major capsid proteins for 
example ,GpD [Bacteriophage lambda] 

5045 gi | 137566 | sp | P03712 | VCAI)#LAMBD (99% identity in 110 
amino acids) 

SEQ ID NO: 393 : -0.239229, 442, a minor capsid protein 
precursor, similar to minor capsid protein precursors for 
example ,GpC [Bacteriophage lambda] 

5050 gi j 137565 | sp | P03711 | VCAC#LAMBD (97% identity in 439 
amino acids), capsid assembly protein containing Nu3-homolog 
SEQ ID NO: 394 : -0.247826, 231, a portal protein (minor capsid 
protein), similar to portal proteins for example ,GpB 
[Bacteriophage lambda] gi j 138762 | sp j P03710 ! V M C B # L A M B D 
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5055 (98% identity in 533 amino acids) 

SEQ ID NO: 395: -0.441584, 304, a head-to-tail joining, similar 
to head-to-tail joining proteins for example ,GpW 

[Bacteriophage lambda] gi | 138415 | sp | P03727 ! VHTJ#LAMBD 
(98% identity in 68 amino acids) 

5060 SEQ ID NO: 396: -0.434392, 190, a terminase large subunit 
(DNA- packaging protein), similar to terminase large subunits 
for example ,GpA [Bacteriophage lambda] 

gi | 137616 | sp | P03708 | TERL#LAMBD (97% identity in 641 
amino acids), GTG start 

5065 SEQ ID NO: 397: -0.085882, 86, a putative terminase small 
subunit, similar to terminasesmall subunits for example ,Nul 
[Bacteriophage lambda] (82% identity in 180 amino acids) 
SEQ ID NO: 398: -0.327551, 99, novel, similar to hypothetical 
proteins for example , [Escherichia coli] 

5070 gi | 1778472 j gb | aaB40755.1 I (90% identity in 53 amino acids) 

SEQ ID NO: 399 : -0.445312, 513, a putative transcription 
regulatory element, similar to PerC (BfpW) transcription 
activator eaeA/bfpA [Escherichia coli] 

gi | 1172431 | sp | P43475 | PERC#ECOLI (47% identity in 87 

5075 amino acids) 

SEQ ID NO: 4 00: 0.010435, 116, a putative lipoprotein R/. 1 
precursor, similar to lipoproteinRzl precursors for 
example .[phage lambda] gi j 540738 | pir | IJN0750 (70% identity 
in 61 amino acids) 

5080 SEQ ID NO: 401: -0.403175, 127, a putative host cell lysis, 
similar to endopeptidases for example , [Bacteriophage H-19B] 
gi I 4335687 | gb | aaD17383.1 | (77% identify in 150 amino acids) 
SEQ ID NO: 402: -0.542391, 93, novel, partially similar to 
hypothetical protein YchG [Escherichia coli] 

5085 gi j 267475 j sp | P30192 | YCHG#ECOLI (80% identity in 30 amino 
acids) 

SEQ ID NO: 403 : -0.42, 51, novel, partially similar to a 
hypothetical proteins for example , YchG [Escherichia coli] 
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gi j 267475 | sp | P30192 | YCHG#ECOLI (95% identity in 60 amino 
5090 acids), GTG start 

SEQ ID NO: 404: -0.364583, 49, novel, partially similar to a 
hypothetical protein b!240 [Escherichia coli] 

gi | 7466155 j pir j | C64871 (54% identity in 51 amino acids) 
[0020] 

5095 3) Proteins comprising Insertion Sequence; IS 

Hi >|ii -.if.' niiobr] l^ydi^phr, hi city.. E.he n.umbej of amino 
acids,. < !].n,:dci -ach - u.uctu),i 

SEQ ID NO: 405 : -0.221861, 216, novel, identical to 
hypothetical protein [Bacteriophage VT2-Sa] 

5100 gi | 5881622 | dbj | Baa84313.1 | , but [having] different start; 

similar to hypothetical protein [Bacteriophage 933W] 
gi | 4499790 i emb j CAB39289.1 | (85% identity in 78 amino acids) 
SEQ ID NO: 406 : -0.313776, 197, novel, identical to 
hypothetical protein [Bacteriophage VT2-Sa] 

5105 gi | 5881623 | dbj | Baa 84 3 14.1 j , but [having] different start; 

similar to hypothetical proteins for example , NinB protein 
[Bacteriophage 2l]( 43% identity in 147 amino acids) 
SEQ ID NO: 407: -0.486667, 61, a putative DNA methylase, 
identical to hypothetical protein [Bacteriophage VT2-Sa] 

5110 gi | 5881624 | dbj | Baa84315.1 | (100% identity in 175 amino 
acids); similar to hypothetical protein G p 6 2 [Bacteriophage 
HK97] gi ! 6901634 j gb j aaF31137.1 j (98% identity in 175 amino 
acids); similar to (at low level) DNA 

N- 6-adenine-methyltransferase (M.Tl) [Enterobacteria phage 

5115 Tl] gi | 166164 | gb | aaA87390.1 | (31% identity in 143 amino 
acids) 

SEQ ID NO: 408 : -0.175926, 55, novel, identical to 
hypothetical protein [Bacteriophage VT2-Sa] 

gi j 5881625 | dbj | Baa84316.1 | (100% identity in 60 amino 
5120 acids); similar to hypothetical proteins for example , NinE 
protein [Bacteriophage 2l]gi | 4539480 | emb | CAB39989.1 | (98% 
identity in 60 amino acids) 
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SEQ ID NO: 409 : -0.017752, 170, novel, identical to 
hypothetical protein [Bacteriophage VT2-Sa] 

5125 gi | 5881826 | dhj [ Baa84317.1 | (100% identity in 57 amino acids), 
GTG start 

SEQ ID NO: 1375: "0.38883, 189, a putative antirepressor 
protein, identical to hypothetical protein [Bacteriophage 
VT2-Sa] gi | 5881627 | dbj | Baa84318.1 | (100% identity in 244 

5130 amino acids); its Oterminal part similar to Oterminal part 
antirepressor protein Ant [Bacteriophage P22] 

gi | 131843 | sp | P03037 | RANT#BPP22 (82% identity in 104 
amino acids), its N-terminal part similar to N-terminal part of 
hypothetical protein [Bacteriophage TP901- 1 j 

5135 gi | 2924237 | emb | Caa74615.1 | (42% identity in 119 amino 
acids) 

SEQ ID NO: 1376 : -0.209115, 374, a DNA-binding protein, 
identical to Roi [ Bacteriophage VT2-Sa] 

gi I 5881628 | dbj | Baa 84 3 19.1 j , but [having] different start; 
5140 similar to Roi proteins for example , [En tero bacteria phage 
HK022] gi | 1197729 | gb ! aa C48863.1 i (82% identity in 242 
amino acids) 

SEQ ID NO: 1377 : 0.177508, 1028, novel, identical to 
hypothetical protein orf 1 5 [ Bacteriophage 933 Wi 

5145 gi | 4499798 | emb | CAB39297.1 | (100% identity in 201 amino 
acids), similar to hypothetical proteins for example ,NinG 
protein [Bacteriophage 21] gi j 4539482 | emb [ CAB39991.1 | (94% 
identity in 201 amino acids) 

SEQ ID NO: 1378 : -0.144201, 458, novel, identical to 
5150 hypothetical protein orfl6l Bacteriophage 933W] 

gi j 4499799 | emb | CAB39298.1 | (100% identity in 64 amino 
acids); similar to hypothetical proteins for example ,Nin68 
[Bacteriophage lambdaigi j 9626304 | ref | NP#040640 . 1 j (80% 

identity in 60 amino acids) 
5155 SEQ ID NO: 1379: 0.890181, 388, antitermination protein Q, 
identical to antitermination Q protein [Bacteriophage 933Wj 
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gi j 4585416 | gb | aaD2 5444,1 | AF125520#39, but [having] 
different start; similar to anti termination Q proteins for 
example , [Bacteriophage H-19B] gi | 2668768 | gb | aal)04655. 1 | 
5160 (96% identity in 144 amino acids) 

SEQ ID NO: - : -0,090909, 221, novel, partially similar to 
h y p o t h e t i c a 1 p r o t e i n [ B a c t e r i o p h a g e P 2 7 ] 

gi | 8346570 | emb | CAB93763.1 | ( 89% identity in 37 amino 
acids) , TTG start 

5165 SEQ ID NO: 1676: 0.087912, 92, a Shiga toxin 2 subunit A, 
identical to gi | 1351074 | sp | P09385 | SLTA#BP933; identical to 
ECsl908: Comp. (1899924- 1900292), -0.25, 123, Shiga toxin 2 
subunit B gi | 134538 i sp | P09386 ! SLTB#BP933 
SEQ ID NO: 1644 : -0.397973, 297, novel, identical to 

5170 N-te r m i n a 1 p a r t o f h y pot h e t i c a 1 p r o t e i n [ B a c t e r i o p h a g e 9 3 3 W ] 
gi | 4585419 | gb | aaD25447.1 | AF125520#42 (100% identity in 
557 amino acids) I similar to N-terminal part of hypothetical 
proteins for example ,YjhS [Shigella dysenteriae] 

gi I 675 9965 | gb | aaF2 8 1 2 3 . 1 | AF 15331 7# 19 (78% identity in 554 

5175 amino acids) 

SEQ ID NO: 1645 : -0.965741, 109, a transposase (OrfB) 
(insertion sequenceIS629), identical to 

gi | 7443862 | pir | | T00240 

SEQ ID NO: 1681 : -0.893204, 104, a transposase (OrfA) 
5180 (insertion sequenceIS629), identical to 

gi | 7444868 | pir | | T00241 (100% identity in 108 amino acids) 
SEQ ID NO: - : -0.342857, 85, novel, identical to hypothetical 
protein [Bacteriophage 933W] gi | 4499806 | emb | CAB39305.1 I 
(100% identity in 59 amino acids) 
5185 SEQ ID NO: - : -0.577099, 263, novel, identical to 
hypothetical protein [Bacteriophage 93 3 W] 

gi j 4585420 j gb j aaD25448,l |AF125520#43 (100% identity in 
148 amino acids) 

SEQ ID NO: 877: -0.830769, 79, a putative holin protein, 
5190 identical to protein [Bacteriophage VT2-Sal 
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gi j 5881636 | dbj | Baa84327.1 j ; similar to putative holin 
proteins for example , [Shigella dysenteriaej 

gi j 6759967 | gb | aaF28125.1 | AF153317#21 (95% identity in 71 
amino acids) 

5195 SEQ ID NO: 878: -0.5, 141, a endolysin, identical to putative 
endolysin [Bacteriophage 93 3 W] 

gi | 4585422 | gb | aaD25450.1 | AF125520#45 (100% identity in 
177 amino acids); similar to putative endolysins for 
example , [Bacteriophage H-19B] 

5200 gi | 4335686 | gb | aaD17382.1 | (93% identity in 177 amino acids) 

SEQ ID NO: 879: -1.08, 71, a putative antirepressor protein, 
similar to identical to putative antirepressor protein 
[Bacteriophage 933W] 
gi | 4585423 i gb i aaD25451.1 I AF125520#46l antirepressor 

5205 proteinAnt [Bacteriophage P22] 

gi | 131843 | sp | P03037 | RANT#BPP22 (49% identity in 121 
amino acids) 

SEQ ID NO: 880: -0.375862, 88, a putative endopeptidase, 
identical to endopeptidase[Bacteriophage 933W] 

5210 gi | 4585424 j gb j aaD2 5452.1 j AF125520#47 (100% identity in 
154 amino acids); similar to endopeptid a ses for example ,Rz 
[Bacteriophage lambda] gi j 11 9368 ! sp | P00726 ! ENPP#LAMBD 
(72% identity in 154 amino acids) 

SEQ ID NO: 881 : -0.477359, 54, a putative lipoproteinRzl 
5215 precursor, identical to putative Rzl protein precursor 
[Bacteriophage 933W] 
gi | 4585425 ! gb | aaD25453.1 j AF125520#48(100% identity in 61 
amino acids); similar to lipoproteinRzl precursor 
[Bacteriophage lambda] gi | 1 017781 j gb ! aaC48862. 1 1 (72% 
5220 identity in 61 amino acids) 

SEQ ID NO: 882 : -0.293827, 82, a Bor protein precursor, 
identical to [Bacteriophage 933 W] 

gi j 4585426 j gb j aaD25454.1 |AF125520#49 (100% identity in 97 
amino acids); similar to Bor protein precursor [Bacteriophage 
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5225 lambda] gi | 137520 | sp j P26814 | VBOR#LAMBD (96% identity in 
97 amino acids) 

SEQ ID NO: 883 : -0.305483, 384, novel, similar to 
hypothetical protein [Bacteriophage VT2 - Sa I 

gi I 5881640 | dbj j Baa84331.1 I (85% identity in 75 amino acids) 
5230 SEQ ID NO: 884: -0.434955, 330, a putative small subunit 
terminase, identical to putative small subunit terminase 
[Bacteriophage 933W] 
gi | 4585427 | gb | aal)25455.1 | AF125520#50 (100% identity in 
2 68 amino acids) 

5235 SEQ ID NO: 885: -0.576025, 464, a putative terminase large 
subunit, identical to putative terminase large subunit 
[Bacteriophage 933W] 
gi | 4585428 ! gb I a a D 2 5 4 5 6.1 j AF 1 2552 0# 5 1 (100% identity in 
568 amino acids) 

5240 SEQ ID NO: 886: -0.238694, 200, a putative portal protein, 
identical to putative portal protein [Bacteriophage 933W] 
gi I 4585429 | gb | aaD25457.1 | AF125520#52 (100% identity in 
714 amino acids) 

SEQ ID NO: 887 : -0.438542, 97, novel, identical to 
5245 hypothetical protein [Bacteriophage 933W] 

gi | 458 5 4 30 j gb j a a D 2 5 4 58.1 j AF 1 2552 ()# 5 3 (100% identity in 
335 amino acids) 

SEQ ID NO: 888 : -0.264131, 185, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

5250 gi j 4585431 I gb j aaD25459.i ! AF125520#54 (100% identity in 
404 amino acids) 

SEQ ID NO: 889 : -0.237063, 144, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi j 4585432 | gb | aaD25460.1 | AF125520#55 (100% identity in 
5255 129 amino acids) 

SEQ ID NO: 890 : 1.472727, 56, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi | 4585433 I gb | aaD25461.1 | AF125520#56, but [having] 
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different start 

5260 SEQ ID NO: 891 : -0.255915, 618, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi i 4585434 I gb I aaD25462.1 | AF125520#57, but [having] 
different start 

SEQ ID NO: 892 : 0.052113, 72, novel, identical to 
5265 hypothetical protein [Bacteriophage 933W] 

gi j 4585435 j gb j aaD25463.1 !AF125520#58 (100% identity in 
216 amino acids) 

S E Q ID NO: 893: -0.046 4 9 1 , 1 1 5 , a p utat i v e t a i 1 fi b e r p r o t e i n , 
identical to putative tail fiber protein [Bacteriophage 93 3W] 
5270 gi | 4585436 | gb j aaD25464.1 j AF125520#59(l00% identity in 645 
amino acids) 

SEQ ID NO: 894 : -0.466667, 178, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi I 4585437 j gb j aaD25465.1 !AF125520#60, but [having] 
5275 different start 

SEQ ID NO: 895 : -0.283069, 190, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi | 4585438 I gb j aaD25466.1 j AF 12552 0#61, but [having] 
different start 

5280 SEQ ID NO: 896 : -0.472903, 156, novel [putative outer 
membrane protein; OMP], TTG start 

SEQ ID NO: 897: -0.717334, 76, novel [periplasmic] , identical 
to hypothetical protein [Bacteriophage 933Wi 

gi | 4585439 j gb j aaD25467.1 j AF125520#62 (100% identity in 
5285 567 amino acids) ; its N-terminal part similar to hypothetical 
protein [Bacteriophage P-EibD] 

gi | 752 3 5 3 8 ! gb j a a F 6 3 0 4 3 . 1 | AF 1 5 1 6 7 5# 5 (98% identity in 147 
amino acids), GTG start 

SEQ ID NO: 898: -0.468595, 122, a putative tail tip fiber 
5290 protein, identical to hypothetical protein [Bacteriophage 
933 W] gi ! 4585440 | gb | aaD25468.1 |AF125520#63 (100% 
identity in 422 amino acids); similar to(at low level) tail tip 
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fiber protein gp21 [phage N15] gi | 7444604 | pir | | T13107 (24% 
identity in 381 amino acids) 

SEQ ID NO: 899 : -1.029787, 48, novel [putative outer 
membrane protein; OMPi , identical to hypothetical protein 
[Bacteriophage 933W] 
gi | 4585441 | gb | aaD25469.1 | AF125520#64, but [having] 
different start, TTG start 

SEQ ID NO: 900 : -0,648128, 188, novel [putative outer 
membrane protein; OMP], identical to hypothetical protein 
[ B a c t e r i o p h age 9 3 3 W ] 

gi | 4585442 | gb | aaD25470.1 | AF125520#65 (100% identity in 
205 amino acids) 

SEQ ID NO: 901: -0.117179, 554, a putative outer membrane 
precursor, identical to putative Lorn precursor [Bacteriophage 
933W] gi | 4585443 | gb | aaD25471.1 | AF125520#66 (100% 
identity in 244 amino acids); similar to outer membrane 
proteinrck [Salmonella typhimurium] gi | 282013 | pir | | A43309 
(35% identity in 172 amino acids); outer membrane protein 
Lorn precursor gi ] 138693 | sp | P03701 | VL O M # L AM B D (35% 
identity in 167 amino acids): ail gene products for 
example , [Yersinia pseudotuberculosis] 

gi | 5902750 ! sp j Q56957 | AIL#YERPS (32% identity in 241 amino 
acids); virulence proteinpagC precursor [Salmonella 

typhimurium] gi j 129558 | sp | P23988 j PAGC#SALTY (29% 
identity in 180 amino acids) 

SEQ ID NO: 902 : -0.148992, 646, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi | 4585444 | gb | aaD25472.1 j AF125520#67 (100% identity in 
133 amino acids) 

SEQ ID NO: 903: -0.831147, 62, novel, similar to hypothetical 
protein [Bacteriophage 933W] 

gi j 4585445 | gb | aaD25473.1 | AF125520#68 (100% identity in 
218 amino acids) 

SEQ ID NO: 904 : -0.482819, 455, novel, identical to 
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hypothetical protein [Bacteriophage 93 3 W] 

gi j 4585446 | gb j aaD25474.1 | AF125520#69 (100% identity in 
148 amino acids) 

5330 SEQ ID NO: 905 : -0.420639, 408, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi | 4585447 | gb | aaD25475.1 | AF125520#70 (100% identity in 83 
amino acids) 

SEQ ID NO: 906 : "0.063889, 109, novel, identical to 
5335 hypothetical protein [Bacteriophage 933W] 

gi | 4585448 | gb | aaD25476.1 | AF125520#71 (100% identity in 
421 amino acids) 

SEQ ID NO: 907 : -0.171552, 117, novel, similar to 
hypothetical protein [Bacteriophage 933W] 

5340 gi | 4585449 | gb | aaD25477.1 | AF125520#72 (99% identity in 
2 793 amino acids) 

SEQ ID NO: 908 : "0.512838, 149, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi | 4585450 | gb | aaD25478.1 | AF125520#73, but [having] 

5345 different start 

SEQ ID NO: 909 : 0.189474, 115, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi | 9632540 | ref | NP#049534. 1 | (100% identity in 114 amino 
acids); similar to hypothetical proteins for example ,ygiW 

5350 protein precursor [Escherichia coli] 

gi | 1723887 | sp | P52083 | YGIW#ECOLI (53% identity in 93 
amino acids) 

SEQ ID NO: 910: -0.313446, 239, a MokW protein (prophage 
maintenanceJmodulation of host cell killing), identical to MokW 

535 5 [Bacteriophage 933 W] 

gi j 4585453 | gb | aaD25481.1 | AF125520#76 (100% identity in 70 
amino acids); similar to GelF [Escherichia coli] 
gi | 1786200 | gb j aaC73129.1 j (73% identity in 69 amino acids) 
SEQ ID NO: 911 : -0.276613, 125, novel, identical to 

5360 hypothetical protein [Bacteriophage 933W] 
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gi j 4585454 j gb | aaD2 5482,1 | AF125520#77, but [having] 
different start 

SEQ ID NO: 912 : -0.091398, 94, novel, identical to 
hypothetical protein [Bacteriophage VT2 - Sa I 

5365 gi | 5881668 i dbj | Baa84359.1 | (100% identity in 219 amino 
acids); identical to Oterminal part of hypothetical protein 
[Bacteriophage 933W] 
gi j 4585455 j gb j aaD25483.1 !AF125520#78(l00% identity in 219 
amino acids) 

5370 SEQ ID NO: 913 : -0.343275, 1027, novel, identical to 
hypothetical protein [Bacteriophage VT2-Sa] 

gi | 5881669 I dbj | Baa84360.1 | , but [having! different start; 
similar to hypothetical protein [Bacteriophage 933W] 
gi | 7649907 | dbj | Baa94185.1 | (92% identity in 72 amino acids); 

5375 hypothetical proteins for example , [Bacteriophage VT2-Saj 
gi I 4585386 j gb j aaD25414.1 !AF125520#9 (92% identity in 68 
amino acids) 

SEQ ID NO: 914 : -0.624779, 114, novel, identical to 
hypothetical protein [Bacteriophage VT2-Sa] 

5380 gi | 5881670 I dbj | Baa84361.1 | (100% identity in 94 amino acids), 
GTG start 

SEQ ID NO: 915 : -0.332759, 233, novel, identical to 
hypothetical protein [Bacteriophage VT2"Sa] 

gi | 5881671 | dbj | Baa84362.1 | (100% identity in 73 amino 

5385 acids); similar to C4-type zinc finger proteins (TraR family) 
for example ,orf39 [Pseudomonas aeruginosa phage phi CTX] 
gi | 4063813 I dbj j Baa 3 62 6 7.1 | (42% identity in 59 amino acids) 
SEQ ID NO: 916: -0.407287, 248, a putative anti-repressor 
protein, identical to hypothetical protein [Bacteriophage 

5390 VT2-Sa] gi | 5881672 | dbj | Baa84363.1 ! (100% identity in 209 
amino acids); similar to hypothetical protein HI1422 
[Haemophilus influenzae Rdj 

gi j 1175795 | sp | P44193 ! YE22#HAEIN (40% identity in 158 
amino acids); putative phage anti-repressor proteins for 
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5395 example .[Neisseria meningitidis] 

gij7379969iembjCAB84545.il (49% identity in 112 amino 
acids) 

SEQ ID NO: 917: 0.069027, 227, novel 

SEQ ID NO: 918: -1.014706, 69, probably resistance to phage 
5400 N4, lambda, Rtn membrane associated protein [ Escherichia 
colilgi | 2498867 | sp | P76446 | RTN#ECOLI (31% identity in 498 
amino acids) 

SEQ ID NO: 919 : -0.130857, 176, novel, similar to FidL 
- S a 1 m o n e 1 1 a t y p h i. m u r i u m g i | 4 3 2 4 6 1 1 | gb | a a D 16955.1 | (29 % 

5405 identity in 149 amino acids) 

SEQ ID NO: 920 : "0.304721, 1166, a putative 
transcriptional vat or, similar to transcriptionactivators for 
example ,MarT [Salmonella typhi murium] 

gi | 4324612 ! gb | aaDl6956.1 I (28% identity in 268 amino acids) 

5410 SEQ ID NO: 921: -0.308543, 200, a putative oxidoreductase, 
similar to oxidoreductases for example , [Escherichia coli] 
gi | 2492762 | sp | P76633 | YGCW#ECOLI (55% identity in 257 
amino acids) 

SEQ ID NO: 922 : -0.814127, 362, a putative chaperone, 
5415 similar to hypothetical proteins for example , ORF60 
[Yersinia pestis] gi | 7467334 | pir | | T17432 (48% identity in 204 
amino acids); chaperone proteins for example ,EcpD 
[Escherichia colli gi | 2506408 | sp | P33128 | ECPD#ECOLI (35% 
identity in 185 amino acids) 
5420 SEQ ID NO: 923 : -0.431859, 114, novel, similar to 
hypothetical proteins for example ,ORF59 [Yersinia pestis] 
gi | 410662 7 | emb j Caa2 1382.1 | (34% identity in 438 amino 
acids) 

SEQ ID NO: 924: -0.114136, 192, a putative outer membrane 
5425 usher protein, similar to hypothetical protein ORF 58 
[Yersinia pestis] gi | 4106626 j emb ! Caa2 1 38 1 . 1 j (44% identity 
in 824 amino acids); outer membrane usher proteins for 
example ,FimD [Salmonella typhimurium 
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gi j 585135 | sp | P37924 | FIMD#SALTY (32% identity in 832 

543 0 amino acids) 

SEQ ID NO: 925 : -0.282297, 210, a putative chaperone, 
similar to hypothetical protein ORF57 [Yersinia pestis] 
gi | 4106625 | emb | Caa21380.1 | (39% identity in 233 amino 
acids); chaperone proteins for example ,EcpD [Escherichia 

5435 coli] gi ! 2506408 | sp | P33128 | ECPD#ECOLI (36% identity in 217 
amino acids) 

SEQ ID NO: 926: -0.123005, 214, a putative pilin protein, 
similar to hypothetical protein ORF56 [Yersinia pestis] 
gi | 4106624 | emb | Caa2 1379.1 | (36% identity in 185 amino 
5440 acids); major pilin proteins for example ,Sf amino acids 
[Escherichia coli] gi | 4105989 | gb | aaD02646.1 | (32% identity in 
181 amino acids) 

SEQ ID NO: - : -0.309259, 109, novel 

SEQ ID NO: 1488: -0.323145, 1012, a putative filamentous 
5445 hemagglutinin "like protein, similar to 

h e m agglut i n i n / hemolysin-related proteins [ N e i s s e r i a 

meningitidis] for example ,gi | 7225719 | gb | aaF4092 7.1 | (25% 
identity in 1001 amino acids); filamentous hemagglutinin B 
precursor [Bordetella pertussis] gi | 782 1 3 | pir | |S2 1010(20% 
545 0 identity in 824 amino acids) 

SEQ ID NO: - : -0.353779, 808, a putative hemolysin 
activatorrelated protein, similar to hemolysin activatorrelated 
proteins for example , [Pectobacterium chrysanthemi] 

gi ! 1772622 I gb I aaC3198().l | (27% identity in 484 amino 
5455 acids) ^hemolysin activation protein precursor [ Serratia 
marcescens] gi I 123205 | sp | P15321 | HLYB#SERMA (24% 
identity in 475 amino acids) 

SEQ ID NO: 1608 : -0.270213, 142, a putative 
holo- [acyl- carrier protein] synthase, similar to 

5460 holo- [acyhcarrier protein] syntha ses for 

example , [Campylobacter jejuni] gi j 6968838 | emb | CAB73833.1 I 
(39% identity in 121 amino acids) 



Appendix B: Hideo et at. Full Translation 

SEQ ID NO: 1609: -0.224107, 113, a putative 3-oxoacyl- (acyl 
carrier protein) reductase, similar to 3-oxoacyi-(aeyl carrier 

5465 protein) reductases for example , [Moritella marina] 

gi | 7227179 I gb I aaF42251.1 I (41% identity in 188 amino acids) 
SEQ ID NO: 1610 : -0.570629, 144, a putative 
( 3 R )-h y droxym y r i s t o 1 - ( a c y 1 c a r rier p r o t ein) d e h y d r a t a s e , 
similar to (3R)-hydroxymyristol-(acyl carrier protein) 

5470 dehydratases for example ,gi | 7190847 | gb | aaF39621.1 (30% 
identity in 158 amino acids) 

SEQ ID NO: 1611 : -0.0544, 126, a putative acyl carrier 
protein, similar to acyl carrier proteins for example , AcpC 
I Streptococcus agalactiaej 
5475 gi | 4886773 | gb | aaD32036.1 | AF093787#4 (38% identity in 86 
amino acids) 

SEQ ID NO: 1409: -0.480057, 703, a putative aminomethyl 
transferase, similar to aminpometyl transferases for 
example ,gi | 7450600 | pir | | C75088 (26% identity in 333 amino 
5480 acids) 

SEQ ID NO: 1410 : -0.678001, 1401, a putative 
3-oxoacyl- [acyl-carrier- protein] synthase, its N- terminal- half 
part is similar to 3-oxoacyl- [acyl-carrier- protein] synthase (EC 
2.3.1.41) [Bacillus subtilis] gi | 7433750 | pir | | G69842 (37% 

5485 identity in 393 amino acids); its C- terminal-half part is similar 
to gi | 7433750 | pir I | G69842 (22% identity in 439 amino acids); 
similar to N- and C -terminal-half part nodulation. proteins 
(nodE) for example , [Rhizobium meiiioti plasmid ] 

gi | 128459 | sp | P06230 i NODE#RHIME, product comprises two 

5490 3-oxoacyl- [acyl-carrier- protein] 

SEQ ID NO: 1628: -0.368862, 168, novel, similar to(at low 
level) a part of polyketide synthases for 
example , [ Streptomyces sp. strain MA6548] 

gi | 7481905 I pir j | T 17428 (23% identity in 201 amino acids) 

5495 SEQ ID NO: - : "0.500273, 367, novel 

SEQ ID NO: - : "0.253226, 63, a putative ABC transporter , 



Appendix B: Hideo et at. Full Translation 



similar to putative ABC transporters (ATP- binding protein) 
for example ,[Thermotoga maritime] gi | 7445988 | pir | | H72342 
(50% identity in 222 amino acids) 

5500 SEQ ID NO: 1538: -0.112712,237, novel 

SEQ ID NO: 1539 : 0.259358, 188, novel [hypothetical 
membrane protein], similar to hypothetical proteins for 
example , BB J27 [Lyme disease spirochete plasmid J/lp38] 
gi | 7463805 | pir | | D70248 (25% identity in 399 amino acids) 

5505 SEQ ID NO: 1633: -1.014893, 95, novel [periplasmic] 
SEQ ID NO: 1634: -0.166975, 325, novel 

SEQ ID NO: - : -0.77625, 81, a phage integration, similar to 
integrases for example , [Vibrio choleraej 

gi | 498253 | gb | aaC44230.1 | (32% identity in 390 amino acids) 

5 510 ( P 4 1 i k e i n t e g r a s e ) 

SEQ ID NO: 2: -0.123944, 214, novel, similar to(at low level) 
hemagglutinin main component [Clostridium botulinum phage 
(type C)] gi | 1 346254 | sp I P46084 | HA33#CLOBO (23% identity 
in .190 amino acids) 

5515 SEQ ID NO: 3 : -0.274163, 210, a transposase, similar to sB 
proteins for example , [Shigelladysenteriae Iso-ISl] 

gi | 8759959 | gb I aa F28117.1 | AF153317#13 (72% identity in 129 
amino acids), GTG start 

SEQ ID NO: 4: -0.112565, 192, a putative regulatory protein, 
5520 similar to prophage cp4-57regulatory proteinAlpA [Escherichia 
coli (strain K-12)] gi | 461502 | sp | P33997 | ALPA#ECOLI (52% 
identity in 61 amino acids) 

SEQ ID NO: 5: -0.320225, 90, novel, similar to hypothetical 
protein b2625 (Yfjl) [Escherichia coli K-12] 
5525 gij 1723621 | sp | P52124 | YFJI#ECOLI (40% identity in 444 
amino acids) 

SEQ ID NO: 8: -0.628261, 93, novel, similar to(at low level) 
hypothetical protein Cjl244 [Campylobacter jejuni] 

gi ! 696887 7 | emb | CAB73498.1 I (25% identity in 78 amino acids) 
5530 SEQ ID NO: 7: -0.642435, 272, novel, similar to hypothetical 
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protein A153R [Chlorella virus PBCV-lj 

gi | 7461298 | pir | | T17644 (32% identity in 365 amino acids); 
DNA repair protein rad25 PAB0128 [Pyrococcus abyss! (strain 
Orsay)]gi I 7514780 | pir | IA75209 (28% identity in 392 amino 
5535 acidsKputative helicase(D10 protein) [Bacteriophage T5] 
gi | 137606 j sp | P11107 | VD10#BPT5 (27% identity in 393 amino 
acids) 

SEQ ID NO: 8: -0.313568, 200, novel, TTG start 
SEQ ID NO: 9: -0.309146, 1160, novel, identical to LOO 15 
5540 [Escherichia coli] gi | 3414883 | gb | aaC31494.1 | (100% identity 
in 512 amino acids); similar to hypothetical proteins for 
example ,[ Escherichia coli] gi I 3288156 | emb i aall509. 1 | (99% 
identity in 411 amino acids) 

SEQ ID NO: 10 : 0.086667, 226, novel, identical to LOO 14 
5545 [Escherichia coli] gi | 3288157 | emb | Caa 11510.1 | (100% identity 
in 115 amino acids); similar to hypothetical proteins for 
example ,orf50 [Escherichia coli] gi I 6009426 | dbj | Baa84885.1 | 
(76% identity in .1.07 amino acids) 

SEQ ID NO: 11: -0.430396, 228, novel, similar to hypothetical 
5550 proteins for example ,L0013 [Escherichia coli] 

gi | 3414881 | gb | aaC31492.1 | (98% identity in 133 amino acids), 
GTG start 

SEQ ID NO: 12 : -0.358621, 233, a IS30 transposase 
(interrupted), similar to N- terminal part of IS 30 transposas 
5555 for example ,i | 2851554 | sp | P37246 | TRA8#ECOLI (99% identity 
in 101 amino acids) 

SEQ ID NO: 13 : -0.43945, 110, a putative transposase, 
similar to transposases for example ,Hpl [Escherichia coli I 
gi j 3661482 | gb | aaC61713.1 | (98% identity in 272 amino acids), 
5560 InsB [Shigella dysenteriae] 

gi j 5532467 | gb | aaD44751.1 | AF141 323#22(98% identity in 272 
amino acids) 

SEQ ID NO: 14 : -0.352643, 871, a putative complement 
resistance protein precursor, similar to lipoproteintraT 
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5565 precursors for example ,gi | 418135 I sp | P32885 I TRTl#ECOLI 

(83% identity in 2 27 amino acids), TTG start 

SEQ ID NO: 15: -0.186861, 138, novel 

SEQ ID NO: 16: -0,535714, 141, novel 

SEQ ID NO: 17: -0,34, 251, novel 
5570 SEQ ID NO: 18: -0.155725, 132, a putative diacy Iglycerol 

kinase, similar to diacylglycerol kinases for 

example ,gi | 125321 | sp | P00556 I KDGL#ECOLI (76% identity in 

119 amino acids) 

SEQ ID NO: 19 : -0.514689, 178, noveltputative outer 
5575 membrane protein; OMP], similar to hypothetical proteins 
for example ,yjdB in basS-adiY intergenic region 
[Escherichiacoli] gi | 731986 1 sp | P30845 I YJDB#ECOLI (45% 
identity in 428 amino acids) 

SEQ ID NO: 20: -0.476923, 118, novel, TTG start 

5580 SEQ ID NO: 21: -0.231818, 133, novel 

SEQ ID NO: 22: -0.38651, 342, novel, GTG start 
SEQ ID NO: 23: -0.159091, 111, an urease accessory protein 
UreD, similar to UreD urease- associated proteins for 
example , [Klebsiellaaerogenes] 

5585 gi | 731078 | sp | Q09063 | URED#KLEAE (71% identity in 242 
amino acids), TTG start 

SEQ ID NO: 24: -0.048747, 440, an urease gamma subunit, 
similar to urease gamma subunits for example , [Klebsiella 
pneumoniae] gi | 137084 | sp j P18316 | U R E 3 # K L E A E (96% identity 

5590 in 100 amino acids) 

SEQ ID NO: 25 : -0.431519, 534, an urease beta subunit, 
similar to urease beta subunits for example , [Klebsiella 
pneumoniae] gi | 137077 | sp | P18315 | URE2#KLEAE (82% 
identity in 106 amino acids) 

5595 SEQ ID NO: 26 : -0.526471, 69, an urease alpha subunit, 
similar to urease alpha subunits for example , [Klebsiella 
pneumoniae] gi I 137070 | sp | P183 14 | URE1#KLEAE (90% 
identity in 56 7 amino acids) 
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SEQ ID NO: 27: -0,582995, 642, an urease accessory protein, 
5600 similar to UreE ureaseaccessory proteins for 

example , [ KlebsieIlaaerogen.es] 

gi j 137095 I sp | P18317 | UREE#KLEAE (80% identity in 154 
amino acids) 

SEQ ID NO: 28: -0.439779, 182, an urease accessory protein, 
5605 similar to UreF ureaseaccessory proteinUreFs for 
example , [Klebsiellaaerogenes] 

gi | 137097 1 sp | PI 83 18 j UREF#KLEAE (79% identity in 224 
amino acids) 

SEQ ID NO: 29 : -0.995946, 75, an urease accessory protein, 
5610 similar to UREG urease accessory proteins for 
example , [Klebsiellaaerogenes] gi I 137099 | sp | P18319 | UR 
EG#KLEAE (90% identity in 205 amino acids) 

SEQ ID NO: 30 -0.961539, 105, novel, similar to hypothetical 
proteins for example ,TnpJ [Shigella flexneri] 

5615 gi | 5532468 | gb | aal)44752.1 | AF141323#23 (1.00% identity in 87 
amino acids) 
[0021] 

4) Proteins derived from phage 

Sequence number : hydro phobicity. The number of amino a cids, 
5 62 0 Character o iu-h ■- \ ic turn 

SEQ ID NO: 31 : 0.178689, 62, a putative antirepressor, 
similar to antirepressors for example , I Bacteriophage 9 3 3 W ] 
gi j 4585423 j gb j aal)25451 .1 j AF125520#46 (99% identity in 189 
amino acids) 

5625 SEQ ID NO: 32: -0.403947, 153, a putative host cell lysis, 
similar to endolysins for example , [Bacteriophage 933W] 
gi j 4585422 j gb | aaD25450.1 |AF125520#45 (97% identity in 177 
amino acids) 

SEQ ID NO: 33: -0.280953, 190, novel, similar to hypothetical 
5630 protein gi | 3183262 | sp | P76160 | YDFR#ECOLI (45% identity in 
74 amino acids) 

SEQ ID NO: 34: -0.440678, 178, a putative holin protein, 
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similar to holins for example , [Bacteriophage VT2-Sa] 
gi | 5881836 | dbj | Baa84327.1 I (97% identity in 68 amino acids) 
5635 SEQ ID NO: 35: -0.074561, 115, novel, similar to hypothetical 
protein [Bacteriophage VT2-Sa] gi I 5881634 | dbj | Baa84325.1 | 
(53% identity in 602 amino acids) 

SEQ ID NO: 36: 0.142647, 69, novel, similar to tellurium 
resistance protein TerB proteins for example , [Deinococcu s 
5640 radioduransl gi | 7473690 | pir | | C75302 (26% identity in 129 
amino acids) 

SEQ ID NO: 37: -0.225415, 603, a putative transcription 
regulatory element, similar to transcription regulatory 
elements for example , [Escherichia colij 

5645 gi | 586679 j sp | P37638 | YHIW#ECOLI (34% identity in 197 
amino acids) 

SEQ ID NO: 38 : -0.247553, 144, similar to hypothetical 
protein [Bacteriophage P27] gi j 8346569 | emb | CAB93762. 1 ! 
(96% identity in 83 amino acids) 

565 0 SEQ ID NO: 39: 0.054872, 196, a putative anti-terminator 
protein, similar to Q protein [Bacteriophage 21] 
gi | 7440086 I pir | | D71568 (31% identity in 45 amino acids) 
SEQ ID NO: 40: -0.147692, 66, a putative crossover junction 
endodeoxyribon uclease, similar to crossover junction 

5655 endodeoxyribonuclease [Escherichia colij 

gi I 2507117 | sp | P40116 | RUS#ECOLI (42% identity in 94 amino 
acids); Gp67 [BacteriophageHK97] gi | 6901639 | gb | aaF31142.1 | 
(61% identity in 98 amino acids) 

SEQ ID NO: 41 : -0.278804, 185, similar to B1560#ECOLI 
5660 gi | 1787843 (85% identity in 354 amino acids) 
SEQ ID NO: 42: -0.439604, 102, novel 

SEQ ID NO: 43: -0.380555, 361, novel, similar to hypothetical 
proteins for example , [Bacteriophage 933W] 

gi ! 4585451 | gb | aaD25479.1 ! AF125520#74 (99% identity in 114 
5665 amino acids); Ygi [Escherichia coli] 

gi ! 1723887 ! sp I P52083 j YGIW#ECOLI (53% identity in 93 
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amino acids) 

SEQ ID NO: 44 : -0.741111, 91, a prophage maintenance 
protein; modulation of host cell killing, identical to MokW 
5670 [Bacteriophage 933W] 
gi j 4585453 j gb j aaD25481.1 | AF125520#76 (100% identity in 70 
amino acids); similar to Hok/Gef family for example ,Gef 
[Escherichia coli] gi | 2120017 | pir | | S40540 (73% identity in 69 
amino acids) 

5675 SEQ ID NO: 45: -0.235088, 115, novel, similar to hypothetical 
p r o t e ins f o r e x a m p 1 e , [ B a c t e r i o p h a g e 9 3 3 W ] 

gi | 4585382 | gb | aaD25410.1 | AF125520#5 (67% identity in 77 
amino acids) 

SEQ ID NO: 46: 0.222857, 71, novel, similar to hypothetical 
5680 protein [Bacteriophage 933W] 

gi | 4585384 | gb | aaD25412.1 | AF125520#7 (70% identity in 72 
amino acids) 

SEQ ID NO: 47: -0.37027, 186, novel, GTG start 
SEQ ID NO: 48: 0.130555, 73, novel, GTG start 
5685 SEQ ID NO: 49 : -0.680583, 104, novel, similar to Gp9 
[Bacteriophage Mu] gi | 6010430 | gb | aaF01133.1 |AF083977#54 
(28% identity in 94 amino acids) 

SEQ ID NO: 50: 0.116, 76, novel, similar to hypothetical 
protein YdaW [Escherichia coli] 

5690 gi | 3025105 | sp | P76066 | YDAW#ECOLI, (56% identity in 143 
amino acids), TTG start 

SEQ ID NO: 51: -0.382796, 94, a putative replication protein, 
similar to C -terminal-half part of replication protein 14 
[Bacteriophage phi-80] gi | 137937 | sp | P14814 | VG14#BPPH8 

5695 (45% identity in "129 amino acids) 

SEQ ID NO: 52 : -0.438934, 245, novel, similar to C 
-terminal-half part of DnaT [Escherichia coli] 
gi | 1361001 I pir j | 856589 (49% identity in 95 amino acids) 
SEQ ID NO: 53: -0.760454, 221, novel, similar to hypothetical 

5700 protein [Escherichia coli] gi | 3025103 j sp j P76064 j YDAT#ECOLI 
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(30% identity in 141 amino acids) 

SEQ ID NO: 54: -0.684726, 348, a putative regulatory protein, 
similar to Cro [BaeteriophageP2 2] 

gi j 132195 I sp | P09964 | RCRO#BPP22 (39% identity in 53 amino 
5705 acids) 

SEQ ID NO: 55: -0.385816, 142, a putative repressor protein, 
similar to repressor proteins for example ,C2 [Bacteriophage 
P22] gi | 133359 | sp | P03035 j RPC2#BPP22(2 7% identity in 188 
amino acids) 

5710 SEQ ID NO: 56: -0.0975, 81, novel, similar to hypothetical 
proteins for example ,YdfK[ Escherichia coli] 

gi | 140584 | sp | P29008 i YD FA#ECOLI (87% identity in 49 amino 
acids); YdaF gi | 3915965 | sp | P38395 | YDAF#ECOLIF (83% 
identity in 49 amino acids) 

5715 SEQ ID NO: 57: 0.15977, 175, novel, similar to (at low level) 
ATP-dependent protease La homolog 

gi | 1708857 | sp | P42425 | LON2#BACSU (27% identity in 95 
amino acids) 

SEQ ID NO: 58: -0.425974, 78, novel 
5720 SEQ ID NO: 59: -0.477358,213, novel, TTG start 

SEQ ID NO: 60 : -0.526087, 70, a putative cell division 
inhibitor, similar to DicB [Escherichia coli] 
gi | 226094 j prf | j 1410309A (67% identity in 55 amino acids) 
SEQ ID NO: 61: -0.439535, 87, novel, similar to hypothetical 
5725 protein YdfD [Escherichia coli] 

gi | 140587 I sp | P29010 | YDFD#ECOLI (45% identity in 62 amino 
acids) 

SEQ ID NO: 62: -0.11129, 83, a putative exonuclease, similar 
to exo nucleases for example ,exodeoxyribo nuclease VIII 
5730 [Escherichia coli] gi j 2507105 j sp I P15032 j RECE#ECOLI(57% 
identity in 350 amino acids) 

SEQ ID NO: 63: 0.082258, 63, a putative integrase, similar to 
N-terminal part of putative integrases for 

example , [Escherichia coli cryptic prophage] 
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5735 gi | 7449509 j pir j | E64913 (93% identity in 183 amino acids), 
TTG start, probably disrupted 
SEQ ID NO: 64: -0.580917, 415, novel 

SEQ ID NO: 65: -0.50929, 184, a transposase (OrfB), identical 
to transposase [Escherichia coli plasmid p 0-157 I S 6 2 9 ] 

5740 gi 1 7443862 j pir \ | T00240 

SEQ ID NO: 66: -0.175, 85, a transposase (OrfA), identical to 
hypothetical protein [Escherichia coli plasmid p 0-157 
intron sequence IS629I gi | 7444868 | pir | | T00241 
SEQ ID NO: 67 : -0.397973, 297, a putative transposase, 

5745 similar to putative transposases for example , [Yersinia 
pestis plasmid pMTli gi j 7447905 | pir M T147 10 (78% 
identity in 257 amino acids), TTG start, probably disrupted 
SEQ ID NO: 68: -0.965741, 109, novel, identical to LOO 13 
[Escherichia coli 0-157:H7 strain EDL933] 

5750 gi | 3414881 | gb | aaC3 1492.1 | (100% identity in 126 amino 
acids): similar to hypothetical proteins for example ,Hp3 
[Escherichia coli strain CFT073]gi | 3661484 | gb | aaC61715.1 j 
(100% identity in 74 amino acids) 

SEQ ID NO: 69: -0.092042, 290, novel, identical to L0014 
5755 [Escherichia coli (>157:H7 strain EDL933] 

gi | 3288157 | emb | Caall510.1 | (100% identity in 115 amino 
acids); similar to hypothetical proteins for example , Orf50 
[Escherichia coli strain B171] gi | 6009426 | dbj | Baa 84885.1! 
(76% identity in 107 amino acids) 
5760 SEQ ID NO: 70: -0.403175, 127, novel, identical to L0015 
[Escherichia coli 0-157:117 strain EDL933] 

gi | 3414883 j gb j aaC31494.1 j (100% identity in 512 amino 
acids); similar to hypothetical proteins for 

example .[Escherichia coli plasmid pColV-K30j 

5765 gi j 3288156 | emb | Caa 11 509.1 | (99% identity in 411 amino 
acids) 

SEQ ID NO: 71 : 0.010435, 116, a putative transposase 
(interrupted), similar to N-terminal part of transposases, for 
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example , j Escherichia coli strain B171] 

5770 gi I 1004096 | gb | aaB36833.1 | (89% identity in 132 amino acids) 
SEQ ID NO: 72: -0.445312, 513, novel, similar to hypothetical 
proteins for example ,ORF2 in trcA region [Escherichia coli 
strain B171-8] gi | 4126790 | dbj | Baa36748.1 | (41% identity in 
209 amino acids); ORF4 in trcA region [Escherichia coli strain 
5775 B171-8] gi | 4126792 | dbj | Baa36750.1 | (36% identity in 133 
amino acids) 

SEQ ID NO: 73: -0.736428, 141, novel, similar to hypothetical 
protein [Lacto coccus bacteriophage c2j 

gi | 1146281 | gb | aaA92162.1 | (31% identity in 59 amino acids), 
5780 GTG start 

SEQ ID NO: 74: -0.321951, 124, novel 

SEQ ID NO: 75: -0.187826, 116, novel, similar to hypothetical 
proteins for example ,ORF4 in trcAregion [Escherichia coli 
strain B171-8] gi | 4126792 | dbj | Baa36750.1 | (39% identity in 
5785 124 amino acids); ORF2 in trcA region [Escherichia coli strain 
B171-8] gi | 4126790 | dbj | Baa36748.1 | (27% identity in 171 
amino acids) 

SEQ ID NO: 76: 0.102083, 49, novel, similar to hypothetical 
protein [Bacteriophage 933W] gi | 7649887 | dbj | Baa94 165.1 | 

5 790 (93% identity in 89 amino acids) 

SEQ ID NO: 77: -0.173373, 170, a putative tail fiber protein, 
similar to tail fiber proteins for example .[Bacteriophage 
933W] gi I 4585436 [ gb | aaD25464.1 j AF125520#59(34% identity 
in 339 amino acids) 

5795 SEQ ID NO: 78: -0.320225, 90, a putative outer membrane 
protein, similar to Lorn outer membrane proteins for 
example , [Bacteriophage P-EibA] 

gi j 7532789 | gb | aaF63231.1 | AF151091#2 (68% identity in 199 
amino acids) 

5800 SEQ ID NO: 79: -0.644471, 408, a probably host specificity 
protein (partial), similar to C -terminal-half part of protein 
J I Bacteriophage lambda ! 
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gi | 138412 j sp | P03749 | VHSJ#LAMBD(38% identity in 77 amino 
acids), GTG start, probably disrupted by frameshift 
5805 SEQ ID NO: 80 : -0.313568, 200, a host specificity protein 
(partial), partially similar to protein J [Bacteriophage lambda] 
gi ! 138412 j sp | P03749 | VHS J#LAMBD (85% identity in 639 
amino acids), probably disrupted by frameshift 

SEQ ID NO: 81 : 0.256338, 72, a host specificity protein 
5810 (interrupted). similar to N-terminal part of protein J 
[Bacteriophage lambda] 
gi | 138412 I sp | P03749 i VHSJ#LAMBD(80% identity in 369 
amino acids), truncated by frameshift 

SEQ ID NO: 82: -0.181623, 654, similar to tail assembly,tail 
5815 assembly proteins for example ,Gpl [Bacteriophage lambda] 
gi | 139637 I sp | P03730 | VTAI#LAMBD (68% identity in 224 
amino acids) 

SEQ ID NO: 83: -0.403069, 392, tail assembly, similar to tail 
assembly proteins for example ,GpK [Bacteriophage lambda] 
5820 gi | 139638 | sp | P03729 | VTAK#LAMBD (85% identity in 196 
amino acids), GTG start 

SEQ ID NO: 84: 0.103097, 227, a minor tail component, similar 
to minor tail proteins for example ,GpL [Bacteriophage 
lambda] gi | 138844 | sp | P03738 | VMTL#LAMBD (76% identity in 

5825 232 amino acids) 

SEQ ID NO: 85 : -0.412946, 225, a putative minor tail 
component, similar to minor tail proteins for example ,GpM 
[Bacteriophage lambda] 
gi j 138845 | sp | P03737 I VMTM#LAMBD(44% identity in 110 

5830 amino acids), GTG start 

SEQ ID NO: 86: -0.340086, 233, a putative tail length tape 
measure protein, similar to tail length tape measure proteins 
for example .[Bacteriophage HK97] 

gi j 6901589 | gb | aaF31092.1 | AF069529#5 (52% identity in 1076 

5 83 5 amino acids) 

SEQ ID NO: 87: -0.624779, 114, novel, similar to Oterminal 
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part of Gpl4 [Bacteriophage HK97] 

gi i 6901601 | gb | aaF31104.1 | (60% identity in 96 amino acids), 
probably produced by translational frameshift 
5840 SEQ ID NO: 88: -0.311204, 1081, a putative tail assembly 
chaperone, similar to tail assembly chaperone [Bacteriophage 
HK97] gi | 6901600 | gb | aaF31103.1 | (62% identity in 124 amino 
acids) 

SEQ ID NO: 89 : -0.146237, 94, a putative major tail 
5845 component, similar to major tail sub unit [Bacteriophage HK97] 
gi | 6901588 | gb | aaF31091.1 | AF069529#4 (68% identity in 234 
amino acids) 

SEQ ID NO: 90 : -0.309678, 125, novel, similar to Gpll 
[Bacteriophage HK97] gi | 6901599 | gb | aaF31102.1 | (49% 

5850 identity in 113 amino acids) 

SEQ ID NO: 91 : -0.186135, 239, novel, similar to phage 
hypothetical protein GplO [Bacteriophage HK97] 

gi | 6901598 | gb ! aaF31101.1 | (75% identity in 148 amino acids) 
SEQ ID NO: 92: 0.172807, 115, a putative head-tail adaptor, 

5855 similar to putative head-tail adaptors for 

example , [Bacteriophage HK97] gi | 6901597 I gb | aaF31100.1 | 
(45% identity in 111 amino acids) 
SEQ ID NO: 93: -0.512838, 149, novel 

SEQ ID NO: 94: -0.192241, 117, a putative portal protein, 
5860 similar to portal proteins for example .[Bacteriophage D3] 
gi | 5059250 I gb | aal)38955.1 | (24% identity in 366 amino acids) 
SEQ ID NO: 95: -0.061111, 109, novel 

SEQ ID NO: 96 : -0.483469, 860, a putative major head 
protein/prohead protease, its N- terminal part similar to 

5 865 putative prohead protease for example , [Rhodobacter 
capsulatus] gi I 6467535 j gb j aaF13181.1 | AF181080#3 (30% 
identity in 137 amino acids); its C-terminal part similar to 
major head proteins for example , [Mycobacterium phage L5] 
gi j 465114 | sp | Q05223 j VG17#BPML5 (23% identity in 280 

5870 amino acids) 
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SEQ ID NO: 97: -0.831147, 62, a putative terminase large 
subunit, similar to hypothetical proteins for example ,phage 
1)3 terminase-like protein [Haemophilus influenzae] 
gi | 6739656 I gb I aaF27357.1 I AF198256#11 (22% identity in 472 

5875 amino acids) 

SEQ ID NO: 98: -0.148992, 646, a putative terminase small 
subunit, similar to terminasesmall subunit - PBSX phage 
Bacillus subtilis gi I 1722886 | sp | P39785 | XTMA#BACSU (42% 
identity in 57 amino acids), GTG start 

5880 SEQ ID NO: 99: -0.117179, 554, novel 

SEQ ID NO: 100: -0.648128, 188, a putative DNase, similar 
to(at low level) DNase [Bacteriophage phi-C3 1 1 
gi | 1107475 | emb | Caa62587.1 j (28% identity in 85 amino acids) 
SEQ ID NO: 101 : -1.029787, 48, novel, similar to hypothetical 

5885 proteins for example , [Escherichia colij 

gi | 1778472 j gb j aaB40755.1 I (70% identity in 67 amino acids) 
SEQ ID NO: 102: -0.468595, 122, a lipoproteinRzl precursor, 
similar to lipoproteinRzl precursors for 

example , [Bacteriophage 9 3 3W] 

5890 gi | 458542 5 I gb j aaD25453.1 j AF 12552 0#48 (98% identity in 61 
amino acids) 

SEQ ID NO: 103: -0.717334, 76, an endopeptidase (cell lysis), 
identical to Rz [Bacteriophage VT2-Sa] 

gi I 5881639 | dbj | Baa84330. 1 | ; similar to Rz endopeptidases for 
5895 example , [Bacteriophage lambda] 

gi | 119368 | sp | P00726 | ENPP#LAMBD (69% identity in 153 
amino acids) 

SEQ ID NO: 104 : 0.214754, 62, a putative anti-repressor, 
identical to Ant [Bacteriophage 93 3 W] 

5900 gi j 4585423 | gb | aaD25451.1 | AF125520#46; its N-terminal part 
(amino acids at the position 1-126) similar to anti-repressor Ant 
[Bacteriophage P22] gi j 131843 | sp | P03037 | RANT#BPP22 (49% 
identity in 12 6 amino acids) 

SEQ ID NO: 105 : -0.472903, 156, a putative endolysin, 
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5905 similar to endolysins for example , [Bacteriophage 93 3 W] 
gi j 4585422 | gb | aaD25450.1 I AF125520#45 (96% identity in 177 
amino acids) 

SEQ ID NO: 106 : -0.283069, 190, novel, similar to 
hypothetical protein YdfR (103 amino acids) [ Escherichia coii] 
5910 gi 1 3183262 | sp j P76160 | YDFR#ECOLI (45% identity in 74 
amino acids) 

SEQ ID NO: 107: -0.466667, 178, a putative holin protein, 

similar to holin proteins for example , [Bacteriophage H-19B] 

gi I 2668771 | gb | aaD04658.1 j (97% identi ty in 68 amino acids) 
5915 SEQ ID NO: 108 : -0.074561, 115, novel, similar to 

hypothetical proteins for example , [Bacteriophage 933W.I 

(52% identity in 613 amino acids) 

SEQ ID NO: 109: 0.142647, 69, novel 

SEQ ID NO: 110: -0.212987, 617, novel 
5920 SEQ ID NO: 111: 0.459524, 43, novel, similar to tellurium 

resistance proteins (TerB) for example , [Deinococcus 

radiodurans] gi | 7473690 | pir | | C75302 (26% identity in 120 

amino acids), TTG start 

SEQ ID NO: 112: -0.452273, 89, novel, TTG start 
5925 SEQ ID NO: 113: -0.153521, 143, a putative antitermination 
protein, similar to antitermination Q proteins for 
example , [Bacteriophage 82] gi | 132277 | sp | P13870 | RegQ#BP82 
(75% identity in 229 amino acids) 

SEQ ID NO: 114: -0.142593, 55, a putative crossover junction 
5930 endodeoxyribonuclease, similar to Gp67 [Bacteriophage HK97] 
gi j 6901639 I gb | aaF3 1142.1 | (64% identity in 114 amino acids); 
crossover junction endodeoxyribonucleases Rus 

(Hollidayjunction nuclease) (Holliday junction resolvase) 
[Escherichia coli cryptic lambdoid prophage DLP12] (40% 
5935 identity in 110 amino acids) 

SEQ ID NO: 115 : -0.425764, 230, similar to B1560#ECOLI 
gi| 1787843 (83% identity in 348 amino acids), GTG start 
SEQ ID NO: 116: -0.304202, 120, novel 
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SEQ ID NO: 117: -0,39169,350, novel 

5940 SEQ ID NO: 118: 0.1.5098, 52, novel 

SEQ ID NO: 119: 1.332353, 35, novel, similar to hypothetical 
protein [Salmonella typhimurium] gi | 7467246 j pir j | T030 12 
(28% identity in 69 amino acids); Ren proteins for 
example , [Bacteriophage H-19] gi | 2668762 | gb | aaD04649.1 | 

5945 (26% identity in 109 amino acids) 

SEQ ID NO: 120: "0.410309,195, novel, GTG start 
SEQ ID NO: 121: -0.470229, 132, a putative DNA replication 
protein, similar to DNAreplication protein DnaC homologs for 
example , [Escherichia coli] gi j 742900 1 | pir | | C64886 (79% 

5950 identity in 246 amino acids) 

SEQ ID NO: 122 : -0.365766, 223, a putative replication 
protein, its C -terminal-half part similar to replication proteins 
for example , [Bacteriophage phi- 80] 

gi| 137940 | sp | PI 4815 | VG15#BPPH8 (34% identity in 148 

5955 amino acids): its N-terminal part similar to hypothetical 
protein [Escherichia coli] 

gi | 3025235 | sp | P75978 | YMFN#ECOLI (68% identity in 62 
amino acids) 

SEQ ID NO: 123: -0.47439, 247, novel, similar to hypothetical 
5960 protein YdaY [Escherichia coli K-12] 

gi | 3025103 | sp I P76064 | YDAT#ECOLI (30% identity in 141 
amino acids) 

SEQ ID NO: 124 : -0.667987, 304, novel, similar to 
hypothetical protein YdaS [Escherichia coli] 

5965 gi j 3025102 | sp | P76063 | YDAS#ECOLI (39% identity in 57 
amino acids) 

SEQ ID NO: 125: -0.42695, 142, novel, similar to hypothetical 
protein bll45 [Escherichia coli cryptic prophage el4] 
gi j 7444154 | pir | | F64859 (28% identity in 68 amino acids), TTG 
5970 start 

SEQ ID NO: 126: -0.183, 101, novel 

SEQ ID NO: 127 : -0.718055, 145, novel, similar to 
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hypothetical proteins for example , [Rhizobium sp. NGR234] 
gi i 2496690 | sp | P55534 j Y4KP#RHISN (38% identity in 89 

5975 amino acids) 

SEQ ID NO: 128: -1.053333, 76, novel 

SEQ ID NO: 129: -0.040217, 93, novel, GTG start 

SEQ ID NO: 130: -0.648148, 55, novel, similar to excisionases 

for example , [BacteriophageVT2- Sa] 

5980 dad j AP000363-2 | Baa84285.1 | (43% identity in 69 amino acids) 
SEQ ID NO: 131 : -0.001695, 119, novel [hypothetical 
lipoprotein], similar to hypothetical proteins for 

example ,CJ0034c [Campylobacter jejuni] 

gi | 6967539 i emb | CAB72527.1 (35% identity in 229 amino acids), 

5985 GTG start 

SEQ ID NO: 1595 : -0.731325, 84, a transposase (insertion 
sequence IS 62 9), similar to hypothetical proteins for 
example ,TnpE [Shigella flexneri] 

gi | 5532454 | gb | aal)44738.1 | AF141323#9 (99% identity in 108 

5990 amino acids) 

SEQ ID NO: 1684 : -0.126695, 237, a transposase (OrfB) 
(insertion sequenceIS629), similar to transposase IS629 
gi | 7443863 I pir I | T00315 (98% identity in 295 amino acids) 
SEQ ID NO: 1647 : -0.938889, 109, a putative integrase, 

5995 similar to integrases for example .[Bacteriophage S2] 
gi | 1679807 | emb | Caa96221.1 | (57% identity in 331 amino 
acids) 

SEQ ID NO: 1648: -0.432542, 296, novel, similar to(at low 
level) hypothetical protein b!839[Escherichia colij 

6000 gi | 7451973 | pir | | G64945 (33% identity in 109 amino acids) 

SEQ ID NO: 1158 : -0.498198, 334, novel, similar to(at low 
level) cell division protein Div [Escherichia coli] 

gi | 2 507010 | sp | PI 52 86 (27% identity in 121 amino acids) 
SEQ ID NO: 1159 : -0.102609, 116, a putative transcription 

6005 regulatory element, similar to putative transcription 
regulatory elements for example , i Neisseria meningitidis] 



Appendix B: Hideo et at. Full Translation 



gi I 7226247 ! gb ! aaF41408.1 I (32% identity in 102 amino acids) 
SEQ ID NO: 1160: -0,209722, 217, novel 

SEQ ID NO: 1161: -0.639552, 135, a putative DNA- binding 
protein, similar to putative DNA-binding protein Cox [Vibrio 
cholerae Bacteriophage K139] gi I 4530499 j gb j aaD22064. 1 j 
(46% identity in 56 amino acids); phage hypothetical proteins 
for example , [Bacteriophage S2] gi | 1679810 | emb | Caa 96224,1! 
(42% identity in 61 amino acids); [Escherichia coli retron EC67] 
gi i 141342 1 sp | P21315 I YR7A#ECOLI (42% identity in 61 amino 
acids) 

SEQ ID NO: 1162: -0.051111, 46, novel 
SEQ ID NO: 1163: 0.01194, 68, novel 
SEQ ID NO: 1164: -0.692241, 117, novel 
SEQ ID NO: 1165: -0.229348, 93, novel 
SEQ ID NO: 1166: -0.27625, 81, novel 
SEQ ID NO: 1167: -0,094928, 139, novel 
SEQ ID NO: 1168: -0.673134, 68, novel 
SEQ ID NO: 1169 : -0.281818, 89, 
hypothetical proteins for example 



novel, similar to 
.[Shigella flexneri] 



gi | 421263 j pir j | S34345 (41% identity in 84 amino acids) 

SEQ ID NO: 1170 : -0.030303, 100, a putative derepression 

protein, similar to(at low level) derepression protein epsilon 

[Bacteriophage P4] gi | 137833 | sp | P05463 | VEPS#BPP4 (32% 

identity in 50 amino acids) 

SEQ ID NO: 1171 : -0,201464, 206, novel 

SEQ ID NO: 1172 : -0.709211, 77, a putative replication 
protein, similar to replication proteins for example ,GpA 
[Bacteriophage 186] gi | 1351406 | sp | P41064 | VPA#BP186 (34% 
identity in 567 amino acids) 

SEQ ID NO: 1173 : -0.276033, 122, putative regulation of 
plasmid partition, similar to plasmid partition proteins for 
example ,par [Escherichia coli plasmid El] 

gi j 134954 | sp | P11904 | STBA#ECOLI (46% identity in 314 
amino acids) 
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SEQ ID NO: 1174 : -0,74575, 895, regulation of plasmid 
partition, similar to plasmid partition proteins for 
example ,TSB [Escherichia coli plasmid NRl] 

gi ! 134956 | sp | P11906 | STBB#ECOLI (40% identity in 62 amino 
6045 acids) 

SEQ ID NO: 1175: -0.094984, 320, a putative transposase, its 
N-terminal part (amino acids at the position 1-103/217) is 
identical to N-terminal part of transposase [Escherichia coli 
plasmid p 0-157 insertion sequence 18629] 

6050 gij 7443862 jpir | |T00240(l-103/296 amino acids), its 
C-terminal part ( amino acids at the position 104-217/217) is 
identical to C-terminal part of transposase [Escherichia coli 
plasmid p 0-157 insertion sequence IS629] 

gi | 7443862 i pir i | T00240(l83"296/296 amino acids) 

6055 SEQ ID NO: 1176: -0.466346, 105, a transposase, similar to 
hypothetical proteins in insertion sequences for 

example , [ Escherichia coli plasmid p 0-157 insertion 
sequence IS629] gi j 7444868 i pir i | T00241 (96% identity in 108 
amino acids) 

6060 SEQ ID NO: 1177 : -0.368996, 230, novel, similar to 
hypothetical proteins for example ,orf20 [Escherichia coli 
plasmid pB17l] gi i 6009396 j dbj | Baa84855.1 | (54% identity in 
158 amino acids) (transferase) 

SEQ ID NO: 1178: -0.912037, 109, a putative tail protein, 
6065 similar to tail proteins for example , F protein 

[Bacteriophage 186] gi j 3337273 j gb j aaC341 71.1 | (43% identity 
in 151 amino acids) 

SEQ ID NO: 1179 : -0.174684, 159, novel, similar to 
C-terminal part of tail proteins for example ,GpT 
6070 [Bacteriophage P2] gi | 3139112 | gb | aaD03293.1 I (39% identity 
in 66 amino acids), GTG start, probably disrupted by frameshift 
SEQ ID NO: 1180 : -0.337037, 163, a putative tail protein, 
similar to N-terminal part of tail proteins for example ,GpT 
[Bacteriophage P2] gi j 3337272 j gb j aaC341 70. 1 j (32% identity 
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6075 in 848 amino acids), interrupted by frameshift 

SEQ ID NO: 1181 : -0.326978, 279, a putative phage tail 
protein, similar to gi | 3 139 111 | gb | aaD03292 . 1 I (47% identity 
in 42 amino acids) 

SEQ ID NO: 1182: -0.055746, 697, a putative tail protein, 
6080 similar to tail proteins for example ,GpE [Bacteriophage P2] 
gi ! 3139110 | gb | aaD03291.1 I (31% identity in 85 amino acids) 
SEQ ID NO: 1183 : -0.129487, 79, a putative tail tube 
protein, similar to tail tube proteins for example ,tail 
protein FII [Bacteriophage 186] 

6085 gi | 139325 | sp | P22502 | VPF2#BPP2 (44% identity in 157 amino 
acids) 

SEQ ID NO: 11.84: -0.284298, 122, a putative tail sheath 
protein, similar to tail sheath proteins for example ,FI 
[Pseudomonas aeruginosa bacteriophage phiCTX] 

6090 gi | 4063795 j dbj | Baa36249.1 | (47% identity in 377 amino acids) 
SEQ ID NO: 11.85: -0.266471, 171, a tail protein, similar to 
N-terminal part of tail proteins for example ,GpD 
[Bacteriophage P2j gi | 6136287 | sp | P10312 | VPD#BPP2 (59% 
identity in 70 amino acids) 

6095 SEQ ID NO: 1.186: -0.193147, 395, a transposase, similar to 
transposases for example .[Escherichia coli insertion sequence 
IS30] gi | 2851554 | sp | P37246 | TRA8#ECOLI (99% identity in 
342 amino acids) 

SEQ ID NO: 1187: -0.173832, 108, novel, GTG start 
6100 SEQ ID NO: 1188: -0.841108, 344, novel 

SEQ ID NO: 1189 : -0.626563, 65, similar to FLIC#ECOLI 
gi | 1788232 (55% identity in 585 amino acids) 

SEQ ID NO: 1190 : -0.435484, 94, its N-terminal part (amino 
acids at the position 1-104/379) similar to YEDM#ECOLI 
6105 gi | 1788245 (77% identity in 104 amino acids), its central part 
(amino acids at the position 162-266/379) is similar to 
YEDN#ECOLI gi j 1788244 (60% identity in 105 amino acids), its 
Oterminal part (amino acids at the position 272-331/379) is 
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similar to B1933#EC0LI gi | 1788243 (46% identity in 59 amino 
6110 acids) ; similar to(at low level) YOPM#YERPE SP|P17778 (27% 
identity in 181 amino acids) 

SEQ ID NO: - : -0.296752, 586, similar to C-terminal part of 
YEDL#ECO LI gi 11788242 (61-159/159 amino acids) (93% 
identity in 99 amino acids) 

6115 SEQ ID NO: - : -0.242216, 380, its N-terminal part (amino 
acids at the position 1-104/379) is similar toYEDM#ECOLI 
gi I 1 788245 (76% identity in 104 amino acids), its central part 
(amino acids at the position 162-266/379) is similar to 
YEDN#ECOLI gi j 1788244 (61% identity in 105 amino acids), its 

6120 C-terminal part (amino acids at the position 272-331/379) is 
similar to B1933#ECOLI gi | 1788243 (53% identity in 59 amino 
acids); similar to(at low level) IPAH#SHIFL dad | M32063-1 
(30% identity in 146 amino acids) 

SEQ ID NO: 1554 -0.263636, 100, novel, TTG start 

6125 SEQ ID NO: - : -0.244327, 380, novel 

SEQ ID NO: - : -0.468966, 117, a putative secerted effector 
protein, similar to hypothetical proteins for example , EspF 
[Escherichia coli strain B10] 

gi | 6090818 I gb | aaF03351.1 | AF116900#2 ESPF#ECOLI (39% 

6130 identity in 126 amino acids) 

SEQ ID NO: 756 : -0.497235, 218, novel, similar to 
hypothetical protein [Bacteriophage 933W3 

gi | 4585437 | gb | aal)25465.1 | AF125520#60 (93% identity in 89 
amino acids) 

6135 SEQ ID NO: 757: -0.686944, 338, a putative bacteriophage 
tail fiber protein, similar to tail fiber proteins for 
example , [Bacteriophage 933W] 

gi j 4585436 | gb | aaD25464.1 | AF125520#59 (38% identity in 370 
amino acids) 

6140 SEQ ID NO: 758: -0.324719, 90, a putative outer membrane 
protein, similar to Lorn outer membrane protein precursors 
for example , [prophage P-EibA] 
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gi j 7532789 | gb | aaF63231.1 | AF151G91#2 (68% identity in 199 
amino acids) 

6145 SEQ ID NO: 759 : -0.67254, 438, a bacteriophage host 
specificity protein (partial), similar to C-terminal part of host 
specificity proteins for example ,GpJ [Bacteriophage 
lambda] gi i 138412 | sp I P03749 I VHS J#LAMBD (58% identity in 
788 amino acids), probably disrupted by frameshift 

6150 SEQ ID NO: 760 : -0.313568, 200, a bacteriophage host 
specificity protein (interrupted), similar to N-terminal part of 
host specificity proteins for example , protein J 

[Bacteriophage lambda] gi | 138412 | sp | P03749 | (80% identity in 
369 amino acids), GTG start, interrupted by frameshift 

6155 SEQ ID NO: 761: -0.245668, 809, a putative tail assembly 
protein, similar to tail assembly proteins for example ,GpI 
[Bacteriophage lambda] gi I 139637 | sp | P03730 | VTAI#LAMBD 
(69% identity in 224 amino acids) 

SEQ ID NO: 762: -0.365217, 392, bacteriophage tail assembly, 
6160 similar to tail assembly proteins for example ,GpK 
[Bacteriophage lambda] gi | 139638 I sp | P03729 I V T A K # L A M B D 
(87% identity in 186 amino acids) 

SEQ ID NO: 763: 0.086667, 226, a possible bacteriophage tail 
component, similar to minor tail proteins for example , GpL 
6165 [Bacteriophage lambda] gi | 138844 | sp | P03738 | VMTL#LAMBD 
(76% identity in 2 32 amino acids) 

SEQ ID NO: 764 : -0.344973, 190, a bacteriophage tail 
component, similar to minor tail proteins for example ,GpM 
[Bacteriophage lambda] gi j 138845 | sp I P03737 j VMTM#LAMBD 

6170 (79% identity in 109 amino acids) 

SEQ ID NO: 765 : -0.3125, 233, tail length determination, 
similar to C-terminal part of tail length tape measure protein 
precursors for example ,GpH [Bacteriophage lambda] 
gij 138843 | sp | P03736I VMTH#LAMBD (80% identity in 253 

6175 amino acids), probablydisrupted by frameshift 

SEQ ID NO: 766 : -0.43945, 110, bacteriophage tail length 
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determination, similar to N-terminal part tail length tape 
measure proteins for example ,GpH [Bacteriophage lambda] 
gi j 138843 | sp | P03736 | VMTH#LAMBD (76% identity in 587 

6180 amino acids), interrupted by frameshift 

SEQ ID NO: 787 : -0.258268, 255, a bacteriophage tail 
component, similar to minor tail proteins for example ,GpT 
[Bacteriophage lambda] gi j 138846 | sp | P03 735 j V M T T # L A M B D 
(78% identity in 96 amino acids), probably produced by 

61 85 translationalfra.mesh.ift 

SEQ ID NO: 768: -0.505, 621, a bacteriophage tail component, 
similar to minor tail proteins for example ,GpG 

[Bacteriophage 

lambdajgi j 138842 | sp [ P03734 | V M T G # L A M B D ( 6 8 % identity in 
6190 167 amino acids) 

SEQ ID NO: 769: 0.034653, 102, novel 

SEQ ID NO: 770 : -0.22028, 144, a bacteriophage head 
component, similar to N-terminal part of major head proteins 
for example ,Gp7 [Bacteriophage 21] 

6195 gi | 547612 j sp | P36270 ! HEAD#BPP2 1 percent 

gi | 547612 | sp | P36270 I (95% identity in 88 ammo acids), 
probably interrupted 

SEQ ID NO: 771 : -0.239801, 202, a bacteriophage head 
component, similar to head decoration proteins for 
6200 example ,Gpshp [Bacteriophage 21] 

gi | 549437 | sp | P36275 | VSHP#BPP21 (95% identity in 115 
amino acids) 

SEQ ID NO: 772 : -0.331818, 89, a bacteriophage head-tail 
preconneetor, similar to minor head proteins for 
6205 example , head-tail preconneetor Gp5 [Bacteriophage 21] 
gi ! 549296 j sp | P36273 | VG05#BPP2 1 (97% identity in 501 amino 
acids), scaffold protein(302-501 amino acids) containing 
homolog of Gp6 [Bacteriophage 21] 

SEQ ID NO: 773 : -0.024348, 116, a bactreiophage portal 
6210 protein, similar to portal proteins for example ,Gp5 
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[Bacteriophage 21] gi | 549295 I sp I P36272 | VG()4#BPP21 (98% 
identity in 530 amino acids) 

SEQ ID NO: 774: 0.055688, 502, a putative head completion 
protein, similar to phage proteins for example .head 
6215 completion protein Gp3 [Bacteriophage 21] 

gi j 549294 j sp | P36271 |VG03#BPP21 (98% identity in 68 amino 
acids) 

SEQ ID NO: 775: -0.448868, 531, a bacteriophage terminase 
large subunit, similar to terminase large subunits for 
62 2 0 example ,Gp2 [Bacteriophage 21] 

gi | 2851579 | sp | P36693 | TERL#BPP21 (91% identity in 637 
amino acids) 

SEQ ID NO: 776 : -0,394118, 69, a possible bacteriophage 
terminase small subunit, similar to terminase small subunits 
6225 for example ,Gpl [Bacteriophage N15] gi | 7444578 j pir j | T13087 
(42% identity in 106 amino acids), GTG start 

SEQ ID NO: 777: -0.425233, 643, a transcription regulatory 
element, similar to PerC (BfpW) [ Escheriehiacoli] 
gi | 1172431 | sp | P43475 | PERC#ECOLI (47% identity in 87 

6230 amino acids) 

SEQ ID NO: 778: -0.508875, 170, a lipoprotein precursor, 
similar to lipoprotein Rz 1 precursors for 

example .[Bacteriophage 933W] 

gi | 4585425 j gb j aaD25453.1 |AF125520#48 (85% identity in 61 

6235 amino acids) 

SEQ ID NO: 779: -0.983654, 105, an endopeptidase (host cell 
lysis), similar to Rzendopeptidases for 

example , [Bacteriophage VT2-Sa] 

gi 1 5881639 ! dbj | Baa8433().l j (83% identity in 154 amino acids) 

6240 SEQ ID NO: 780: 0.178689,62, novel 

SEQ ID NO: 781: -0.26, 156, similar to possible endolysins, 
for example ,R protein [Bacteriophage H-19B] 

gi ! 4335686 | gb j aaD17382.1 j (98% identity in 177 amino acids) 
SEQ ID NO: 782: 0.62, 61, novel, similar to YdfR [Escherichia 
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6245 coli] gi | 3183262 | sp | P76160 | YDFR#ECOLI (44% identity in 74 
amino acids) 

SEQ ID NO: 783: -0.393785, 178, a possible holin protein (host 
cell lysis), similar to holin proteins for example , protein 
[Bacteriophage VT2-Sal gi | 5881636 | dbj | Baa84327.1 | (94% 

625 0 identity in 68 amino acids) 

SEQ ID NO: 784: -0.114912, 115, a transposase, identical to 
hypothetical protein [ Escherichia coli plasmid p 0-157 
insertion sequence IS629] gi j 7444868 j pir j | T00241 
SEQ ID NO: 785: 0.133823, 69, a transposase, identical to 

6255 transposase [Escherichia coli plasmid p 0-157 insertion 
sequence IS629] gi | 7443862 I pir j | T00240 

SEQ ID NO: 786 : -0.965741, 109, novel, similar to 
h y p o t h e t i c a 1 p r o t e i n s f o r e x a m pie , [ B a c t e r i o p h. a g e 9 3 3 W ] 
gi | 4585419 | gb | aaD25447.1 | AF125520#42 (53% identity in 613 

6260 amino acids) 

SEQ ID NO: 787: -0.397973, 297, novel, GTG start 

SEQ ID NO: 788: -0.243181, 617, novel, GTG start 

SEQ ID NO: 789: 0.475926, 55, novel, similar to putative 

TerB proteins for example , [Deinococcus radiodurans] 

6265 gi | 7473690 | pir I | C75302 (26% identity in 120 amino acids) 

SEQ ID NO: 790: 1.385455, 56, an antitermination, similar to 
antiterminators for example , protein Q [Bacteriophage 82] 
gi | 1322 77 | sp | PI 3870 i R for example ,Q#BP82 (75% identity in 
229 amino acids) 

6270 SEQ ID NO: 791 : -0.143662, 143, a crossover junction 
endodeoxyribo nuclease, similar to Rus proteins for 
example , [Bacteriophage 82] gi | 6901639 | gb | aaF31142.1 j 
GP67#BPHK97 (63% identity in 112 amino acids); similar to 
Gp67 [Bacteriophage HK97] gi | 6901639 | gb | aaF31142.1 | (63% 

6275 identity in 112 amino acids) 

SEQ ID NO: 792 : -0.393886, 230, novel, similar to 
hypothetical proteins for example ,bl560 [Escherichia coli] 
gi | 7466196 I pir i | C64911 (85% identity in 348 amino acids), 
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GTG start 

6280 SEQ ID NO: 793: -0.221009, 120, novel, similar to orf QD1 
[Bacteriophage N15] gi | 2564084 | gb | aaB81659.1 | (31% identity 
in 64 amino acids) 

SEQ ID NO: 794 : "0.35702, 350, a prophage maintenance 
(modulation of host-cell killing), similar to Hok/Gef family for 

62 85 example ,MokW [Bacteriophage 93 3 W] 

gi | 4585453 | gb | aaD25481.1 | AF125520#76 (87% identity 70 
amino acids) 

SEQ ID NO: 795: -1.208696, 93, novel 

SEQ ID NO: 796: 0.081429, 71, novel, its N-terminal part 
6290 (amino acids at the position 1-46 amino acids) is similar to 
GP45 [Bacteriophage NT 5] gi | 7521552 | pir | IT13131 (56% 
identity in 46 amino acids); its N-terminal part (amino acids at 
the position 37-97) is similar to b2363 [Escherichia coli] 
gi I 7451977 j pir j | II 6500 9 (73% identity in 61 amino acids) 
6295 SEQ ID NO: 797: 1.402941,35, novel 

SEQ ID NO: 798: -0.425134, 188, novel, GTG start 
SEQ ID NO: 799: -0.893204, 104, novel 
SEQ ID NO: 800: -1.069355,63, novel 

SEQ ID NO: 801 : -0.171186, 119, novel, similar to YdaW 
6300 [Escherichia coli] gi | 3025105 | sp | P76066 | YDAW#ECOLI (61% 
identity in 135 amino acids) 

SEQ ID NO: 802: -0.148649, 75, a putative phage replication 
protein, similar to replication proteins for example ,Gpl4 
[Bacteriophage phi-80] gi | 137937 \ sp | P14814 | VG14#BPPH8 
6305 (47% identity in 129 amino acids) 

SEQ ID NO: 803: -0.504741, 233, novel, similar to replication 
termination factor dnaT (primosornal protein 1) i Escherichia 
coli] gi | 1361001 | pir | | S56589 (30% identity in 85 amino acids) 
SEQ ID NO: 804: -0.721364, 221, novel, similar to YdaT 

63 1 0 [Escherichia coli] gi | 3025103 | sp | P76064 1 YDAT#ECOLI (31% 

identity in 83 amino acids); similar to(at low level) regulatory 
protein CII [Bacteriophage phi- 80 1 
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gi j 133360 | sp | P14820 | RPC2#BPPH8 (40% identity in 40 amino 
acids) 

6315 SEQ ID NO: 805: -0.660869, 346, a putative cell division 
control protein (repressor), similar to DicC (repressor protein 
of division inhibition genedicB) [Escherichia coli] 
gi | 118633 | sp | P06965 | DICC#ECOLI (31% identity in 72 amino 
acids) 

6320 SEQ ID NO: 806: "0.360284, 142, a possible repressor protein, 
similar to repressor proteins for example ,C2 repressor 
[Bacteriophage P22] gi | 133359 | sp | P03035 | RPC2#BPP22 (30% 
identity in 203 amino acids) 
SEQ ID NO: 807: -0.694667, 76, novel 

6325 SEQ ID NO: 808 : -0.046047, 216, a possible cell division 
inhibitor, similar to DicB protein [Escherichia coli] 
gi | 2507009 | sp I P09557 | DICB#ECOLI (65% identity in 55 
amino acids) 

SEQ ID NO: 809: -0,494, 51, novel, similar to hypothetical 
6 3 3 0 p r o t e i n s fo r e x a m p I e , Y d f D [ E s c h e r i c h i a c o 1 i ] 

gi | 140587 j sp | P29010 i YDFD#ECOLI (46% identity in 62 amino 
acids) 

SEQ ID NO: 810: -0.01129, 63, novel 
S E Q I D N 0 : 8 1 1 : 0 . 1 1 9 3 55, 63, n o v e 1 
6335 SEQ ID NO: 812: -0.751913, 733, novel 

SEQ ID NO: 813 : -0.487736, 107, an integrase, similar to 
integrases for example , [Bacteriophage HK022] 

gi | 138560 I sp | P16407 | VINT#BPHK0 (24% identity in 316 
amino acids) 

6340 SEQ ID NO: 814: -0.347761, 68, novel, similar to hypothetical 
proteins for example ,L0013 [Escherichia coli 0-.157:H7 
strain EDL933] gi I 3414881 | gb | aaC31492.1 | (100% identity in 
133 amino acids), GTG start 

SEQ ID NO: 815 : -0.722352, 341, novel, similar to 
6345 hypothetical proteins for example , LOO 14 [Escherichia coli 
0-157:117 strain EDL933] gi i 3288157 I emb j Caall510.1 | (100% 



Appendix B: Hideo et at. Full Translation 



identity in 115 amino acids) 

SEQ ID NO: 1581 : -0.388722, 134, novel, similar to 
hypothetical proteins for example , LOO 15 [Escherichia coli 
6350 0-157:H7 strain EDL933] gi | 3414883 I gb | aaC31494.1 | (100% 
identity in 512 amino acids) 
SEQ ID NO: 1582: 0.010435, 116, novel 

SEQ ID NO: 1583: -0.445312, 513, a transposase (insertion 
sequence IS629), similar to IS629 hypothetical proteins for 

6355 example , [Escherichia coli plasmid p 0-157] 

gi | 7444868 | pir | | T00241 (96% identity in 108 amino acids) 
SEQ ID NO: 1349 : -0.262963, 55, a transposase (insertion 
sequence IS629), similar to IS629 transposase [Escherichia coli 
plasmid p 0-157) gi | 7443862 | pir | | T00240 (96% identity in 

6360 296 amino acids) 

SEQ ID NO: 1350: -0.942593, 109, novel, partially similar 
tohypothetical proteins for example , YjdA [Escherichia coli] 
gi | 731985 | sp | P16694 | YJDA#ECOLI (17% identity in 236 
amino acids) (at low level) 

6365 SEQ ID NO: 1351 : -0.402027, 297, novel, similar to 
hypothetical protein YjcZ [Escherichia coli] 

gi | 731984 j sp | P39267 I YJCZ#ECOLI (29% identity in 278 amino 
acids), GTG start 

SEQ ID NO: 1352: -0.652559, 294, novel, similar to(at low 
6370 level) hypothetical proteins for example , [Xanthomonas 
campestris] gi j 6689533 | emb | CAB65709. 1 | (44% identity in 74 
amino acids) 

SEQ ID NO: 1353: -0.372093, 302, novel 
SEQ ID NO: 1354: 0.036798, 357, novel 
6375 SEQ ID NO: 1355 : -0.067841, 228, novel, similar to 
hypothetical proteins for example ,YafZ [Escherichia coli] 
gi j 2495487 | sp | P77206 | YAFZ#ECOLI (75% identity in 272 
amino acids) 

SEQ ID NO: 1356: -0.074265, 137, a putative antirestriction 
63 80 protein, similar to hypothetical proteins for example , Yf j X 
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[Escherichia colij gi i 1723636 | sp | P52139 | YFJX#ECOLI (68% 
identity in 152 amino acids); similar to antirestriction proteins 
for example ,KlcA protein [ plasmid RK2] 

gi j 1730051 I sp | P52603 | KLA2#ECOLI (38% identity in 139 
6385 amino acids) 

SEQ ID NO: .1357 : -0.550183, 274, an acetyl transfera se, 
identical to WbdR [Escherichia coli 0-157:H7 C664-1992] 
gi | 3435182 j gb j aaC32350.1 | 

SEQ ID NO: 1358 : -0.385535, 160, novel, similar to 
6390 Oterminal part, of H repeat-associated proteins for 
example , [Escherichia coli] 

gi | 140772 | sp | P28912 | YHHI#ECOLI (66% identity in 36 amino 
acids), TTG start 

SEQ ID NO: 1259 : 0.180543, 222, novel, similar to H 
6395 repeat-associated proteins for example , [Escherichia coli] 
gi | 140772 j sp | P28912 | YFIHI#ECOLI (75% identity in 49 amino 
acids) 

SEQ ID NO: 1260 : 0.204, 51, novel, similar to H 
repeat-associated proteins for example , [Escherichia coli] 
6400 gi | 140772 j sp | P28912 j YHHI#ECOLI (83% identity in 36 amino 
acids), GTG start 

SEQ ID NO: 1261 : -0.351852, 55, a phosphom a nnom utase, 
identical to ManB [Escherichia coli 0-157:H7 C664-1992] 
gi ! 3435181 ! gb j aaC32349.1 j 

6405 SEQ ID NO: 1262 : -0.141667, 37, a mannose-l-P 
guanosyltransf erase, identical to ManC [Escherichia coli 
0157:117 C664-1992] gi | 3435180 | gb | aaC32348. 1 I 
SEQ ID NO: 1263: -0.222368, 457, a probable GDP-L-fucose 
pathway enzyme, identical to WbdQ [Escherichia coli 

6410 0-157:H7 C664-1992] gi | 3435179 | gb | aaC32347.1 | 

SEQ ID NO: 1264 : -0.221577, 483, a fucose synthetase, 
identical to Fcl [Escherichia coli 0-157:H7 C664-1992] 
gi j 4867922 | dbj j Baa77731.1 | 

SEQ ID NO: 1265 : -0.168047, 170, a GDP-D-mannose 
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6415 dehydratase, identical to Gmd [Escherichia coli 0 _ 157:H7 
C664-1992]gi | 3435177 I gh | aaC32345.1 ! 

SEQ ID NO: 1266: -0.264486, 322, a (e) glycosyl transferase, 
similar to WbdP [Escherichia coli 0-157:H7 C664-1992] 
gi ! 3435176 | gb | aaC32344.1 | 
6420 SEQ ID NO: 1267: -0.261021, 373, a perosamine synthetase, 
identical to Per [Escherichia coli 0-157:H7 C664-1992] 
gi | 3435175 j gb j aaC32343.1 | 

SEQ ID NO: 1268: -0.176485, 405, an O antigen flippase, 
identical to Wzx [Escherichia coli 0-157:H7 C664-1992] 
6425 gi | 3435174 i gb i aaC32342.1 | 

SEQ ID NO: 1269 : -0.321585, 367, a probable glycosyl 
transferase, identical to WbdO [Escherichia coli O - 1 5 7 : H 7 
C664-1992] gi | 3435173 | gb | aaC32341.1 | 

SEQ ID NO: 1270: 0.75141, 462, an O antigen polymerase, 
6430 identical to Wzy [Escherichia coli 0-157:117 C664-1992] 
gi | 34351.72 ! gb ! aaC32340.1 | , GTG start 

SEQ ID NO: 1271: -0.16371, 249, a (e) glycosyl transferase, 
identical to WbdN [Escherichia coli 0-157:H7 C664-1992] 
gi | 4867915 I dbj | Baa7 7 72 4.1 | 
6435 SEQ ID NO: 1272: 0.558884, 395, a putative UDP-galactose 
4 -epi m e r a se, si m i 1 a r to putative UDP-gal a c t o s e 4 - e p i m eras e 
[Vibrio choleraej gi | 3724321 j dbj I Baa33610.1 (27% identity in 
329 amino acids) 

SEQ ID NO: 1273 : -0.404615, 261, novel, similar to 
6440 hypothetical proteins for 

example ,gi | 9106618 j gb j aaF84382. 1 | AE003986#12 [Xylella 
fastidiosaj (60% identity in 105 amino acids) 

SEQ ID NO: 1638 : -0.29577, 332, novel, similar to 
hypothetical protein [Xylella fastidiosaj 

6445 gb | aaF84486.1 |AE003993#5 (52% identity in 86 amino acids) 
SEQ ID NO: 1692: -0.842857, 113, novel 
SEQ ID NO: 1693: -0.109375, 97, novel 

SEQ ID NO: 1588: -0.478481, 80, novel [putative outer 
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membrane protein; OMP] 
6450 SEQ ID NO: 1589: -0,057391, 116, similar to YEHA# ECOLI 
gi I 1788426 (44% identity in 207 amino acids) [putative type-1 
fimbrial protein] 

SEQ ID NO: 1590: 0.006731, 105, similar to YEHB#ECOLI 
gi I 1788427 (92% identity in 826 amino acids): similar to usher 

645 5 protein MrkC [Klebsiella pneumoniae] 

dad | M55912-4 ! aaA25095.1 (32% identity in 810 amino acids) 
SEQ ID NO: - : -0.098256, 345, similar to YEHC#ECOLI 
gi | 1788428 (87% identity in 22 4 amino acids); similar to 
chaperone MrkB [Klebsiella pneumoniae] 

6460 dad | M55912-3 | aaA25094.1 (39% identity in 211 amino acids) 

SEQ ID NO: - : -0.513075, 827, similar to YEHD#ECOL I 
gi | 1788429 (85% identity in 180 amino acids): AC/I pili 
protein [Escherichia coli] dad | X76121-1 | Caa53727.1 (28% 
identity in 177 amino acids) 

6465 SEQ ID NO: - : -0.266071, 225, similar to YEHE#ECOLI 
gil 788430 (69% identity in 93 amino acids) 

SEQ ID NO: - : 0.199444, 181, a putative molybdate 
metabolism regulator, similar to N-terminal part of molybdate 
metabolism. regulator MolR [Escherichia coli] 

6470 gi | 7466653 | pir | | B64979(amino acids at the position 
1-244/1264) (37% identity in 249 ammo acids), GTG start 
SEQ ID NO: - : -0.272043, 94, a putative molybdate 
metabolism regulator, similar to C-terminal part of molybdate 
metabolism regulator molE [Escherichia coli] 

6475 gi j 465576 | sp | P33345 I MOLR#ECOLI (45% identity in 1000 
amino acids), GTG start 

SEQ ID NO: - : -0.647107, 243, identical to transposase (OrfB) 
(insertion sequence IS629), gi | 7443862 j pir j | T00240 
SEQ ID NO: 1509 : -0.306124, 948, similar to transposase 
6480 (OrfA) (insertion sequenceIS629), gi i 7444868 | pir \ | T0024 1 
(99% identity in 108 amino acids) 
SEQ ID NO: 1650: -0.397973, 297, novel 
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SEQ ID NO: 1651 : -0.958333, 109, novel, similar to 
hypothetical proteins for example .[Bacteriophage 93 3 W] 
6485 gi j 4585437 | gb j aaD25465.1 |AF125520#60 (97% identity in 102 
amino acids), TTG start 

SEQ ID NO: 555: -0.584146, 83, a putative tail fiber protein, 
similar to tail fiber proteins for example .[Bacteriophage 
933WJ gi | 4585436 | gb | aaD2 5464.1 | AF125520#59 (36% identity 

6490 in 361 amino acids) 

SEQ ID NO: 556: -0.411765, 103, a putative outer membrane 
protein Lorn precursor, similar to Lorn precursors for 
example , [Bacteriophage P-EibA] 

gi | 7532789 I gb | aaF63231.1 | AF151091#2 (76% identity in 199 

6495 amino acids) 

SEQ ID NO: 557: -0.679634, 438, a putative host, specificity 
protein (partial), similar to C-terminal part of host specific 
proteins for example ,GpJ [Bacteriophage lambda! 

gi | 138412 | sp | P03749 | VHS J#LAMBD(62% identity in 775 

6500 amino acids), GTG start 

SEQ ID NO: 558: -0.288442, 200, a putative host specific- 
protein (interrupted), similar to N-terminus of host specificity 
proteins for example ,GpJ [Bacteriophage lambda] 

gi [ 138412 j sp | P03749 j VHS J#LAMBD(80% identity in 369 

6505 amino acids), GTG start, probably truncated by framesift 

SEQ ID NO: 559: -0.197032, 776, a putative tail assembly 
protein, similar to tail assembly proteins for example ,GpI 
[Bacteriophage lambda] gi ! 139637 | sp | P03730 j V T A I # L A M B D 
(69% identity in 2 24 amino acids) 

6510 SEQ ID NO: 560: -0.365217, 392, a putative tail assembly 
protein, similar to tail assembly proteins for example ,GpK 
[Bacteriophage lambda] gi j 139638 ! sp [ P03729 1 V T A K# L A M B D 
(86% identity in 196 amino acids) 

SEQ ID NO: 561 : 0.086667, 22 6, a putative minor tail 
6515 protein, similar to minor tail proteins for 
example ,GpI [Bacteriophage lambda] 
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gi j 138844 | sp | P03738 | VMTL#LAMBD (76% identity in 232 
amino acids) 

SEQ ID NO: 562: -0.32996, 248, a putative minor tail protein, 
6520 similar to minor tail proteins for example .GpM 
L B a c t e r i o p h age 3. a m b d a 3 

gi | 138845 i sp | P03737 | V M T M # L A M B D ( 7 9 % identity in 109 
amino acids) 

SEQ ID NO: 563: -0.3125, 233, a putative tail length tape 
6525 measure protein precursor, similar to tail length tape measure 
protein precursors for example ,GpH [Bacteriophagelambda] 
gi | 138843 I sp | P03736 I V M T H # L A M B D (49% identity in 876 
amino acids) 

SEQ ID NO: 564 : -0.43945, 110, a putative minor tail 
6530 protein, similar to minor tail proteins for example ,GpT 
[Bacteriophage lambda] 
gi | 138846 | sp | P03735 | VMTT#LAMBD(70% identity in 102 
amino acids), probably produced by translational frameshift 
SEQ ID NO: 565 : -0.353916, 882, a putative minor tail 
6535 protein, similar to minor tail proteins for example .GpG 
[Bacteriophage lambda] gi j 138842 | sp j P03734 [ VMTG#LAMBD 
(43% identity in 140 amino acids) 
SEQ ID NO: 566: -0.358824, 103, novel 

SEQ ID NO: 567 : -0.545714, 141, a putative minor tail 
6540 protein U, similar to minor tail proteins for example ,GpU 
[Bacteriophage lambda] gi j 1 3884 7 | sp | P03732 | VMTU#LAMBD 
(55% identity in 132 amino acids) 

SEQ ID NO: 568: -0.34, 251, a putative minor tail protein, 
similar to minor tail proteins for example ,GpZ 
6545 [Bacteriophage lambda] gi | 138849 | sp j P03731 | VMTZ#LAMBD 
(52% identity in 206 amino acids) 
SEQ ID NO: 569: "0.141667, 133, novel 

SEQ ID NO: 570 : -0.45942, 208, novel (hypothetical 
membrane protein) 
6550 SEQ ID NO: 571: -0.103226,94, novel 
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SEQ ID NO: 572 : 0.549074, 109, a transposase (OrfA) 
(insertion sequence IS629), identical to hypothetical protein 
[Escherichia coli p las mid p 0-157 insert! on sequence IS 62 9 1 
gi | 7444868 | pir | | T00241 
6555 SEQ ID NO: 573 : -0.202367, 339, a transposase (OrfB) 
(insertion sequenceIS629), identical to transposase [Escherichia 
coli plasmid p 0-157 insertion sequence IS629] 

gi | 7443862 | pir | | T00240 

SEQ ID NO: 574 : -0.958333, 109, a putative 

65 60 protease/scaffold protein, similar to ClpP proteases for 
example , [Bacteriophage D3] gi | 5059251 | gb | aaD38956.1 | (39% 
identity in 195 amino acids); putative scaffolding protein 
[Streptococcus thermophilus bacteriophage DTl] 

gi | 4530143 | gb | aaD21883.1 | (31% identity in 193 amino acids), 
6565 GTG start 

SEQ ID NO: 575: -0.397973, 297, a putative portal protein, 
similar to portal protein-like protein [Wolbachia sp. wKue] 
gi | 6723246 | dbj | Baa89642.1 | (24% identity in 438 amino 
acids); similar to(at low level) portal proteins for 
6570 example ,gp4 [phage 21] gi j 549295 | sp | P36272 | VG04#BPP2 1 
(20% identity in 368 amino acids) 
SEQ ID NO: 576: -0.101359, 369, novel 

SEQ ID NO: 577: -0.4932, 501, a putative terminase large 
subunit, similar to terminase large subunit-like protein 

6575 [Wolbachia sp. wKue] gi | 6723244 | dbj | Baa89640. 1 | (25% 
identity in 629 amino acids); terminase large subunits for 
example ,GpA [Bacteriophage lambda] 

gi ! 13761 6 j sp | P03708 | T E R L#L AM B D (23% identity in 615 
amino acids), GTG start 

6580 SEQ ID NO: 578: -0.598718, 79, novel 

SEQ ID NO: 579: -0.665488, 708, a lipoprotein Rzl precursor, 
similar to lipoproteinRz .1 precursors for 

example , [Bacteriophage 933 W] 

gi | 4585425 I gb ! aaD25453.1 ! AF125520#48 (98% identity in 61 
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6585 amino acids) 

SEQ ID NO: 580: -0,458861, 159, an endopeptidase (host cell 
lysis), identical to Rz [Bacteriophage VT2-Sa] 

gi | 5881639 i dbj | Baa84330.1 | ; similar to endopeptidases for 
example ,Rz I Bacteriophage lambda] 

6590 gi I 119368 | sp | P00726 | ENPP#LAMBD (69% identity in 153 
amino acids) 

SEQ ID NO: 581 : 0.214754, 62, a putative antirepressor 
protein, identical to putative antirepressor protein 
[ B a c t e r i o p h age 9 3 3 W ] 

6595 gi | 4585423 | gb | aaD25451.1 | AF125520#46; its N-terminal part 
(amino acids at the position 1-126) is similar to antirepressor 
proteinAnt [Bacteriophage P22] (49% identity in 126 amino 
acids) 

SEQ ID NO: 582 : -0.472903, 156, a putative endolysin, 
6600 identical to endolysin [Bacteriophage 933 W] 

gi | 4585422 | gb | aal)25450.1 | AF125520#45 ; similar to 
endolysins for example ,R protein [Bacteriophage H-19B] 
gi [ 4335686 i gb i aaD17382.1 I (93% identity in 177 amino acids) 
SEQ ID NO: 583: "0.283069, 190, a putative holin protein, 
6605 identical to putative holin [Bacteriophage 933W] 

gil4499808jembjCAB39307.il; similar to holin proteins for 
example , protein [Bacteriophage 21] 

gi | 138706 | sp | P27360 | VLYS#BPP21 (77% identity in 71 amino 
acids) 

6610 SEQ ID NO: 584 : -0.449153, 178, novel, similar to 
hypothetical proteins for example , [Shigella dy senteriae] 
gi | 6759966 | gb | aaF28124.1 | AF153317#20 (91% identity in 81 
amino acids) 

SEQ ID NO: 585 : 0.039437, 72, novel, identical to 
6615 hypothetical protein [Bacteriophage 933W] 

gi | 4499806 j emb j CAB39305.1 | 

SEQ ID NO: 586: -0.312346, 82, novel, similar to hypothetical 
proteins for example , [Bacteriophage VT2-Sa] 
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gi | 5881634 j dbj I Baa84325.1 I (92% identity in 649 amino acids) 
6620 SEQ ID NO: 587: 0.008475, 60, a Shiga toxin I subunit B 
precursor, identical to Shiga toxin I subunit B precursor 
gi j 134539 | sp | P08027 j SLTB#BPH30 

SEQ ID NO: 588: -0.218518, 649, a Shiga toxin I subunit A 
precursor, identical to Shiga toxin I subunit A precursor 

6625 [Shigella dysenteriae] gi | 134537 j sp j P10149 | SLTA#BPH30 

SEQ ID NO: 589: 0.031461, 90, an antitermination protein, 
similar to antitermination proteins for example , protein Q 
[Bacteriophage H-19B] (95% identity in 144 amino acids) 
SEQ ID NO: 590 : 0.083492, 316, novel, similar to 

6630 hypothetical proteins for example ,Nin 68 [Bacteriophage 
lambda] gi | 1351593 | sp | P03771 | Y68#LAMBD (80% identity in 
60 amino acids) 

SEQ ID NO: 591 : -0.268056, 145, novel, similar to 
hypothetical proteins for example ,NinG protein 
6635 [Bacteriophage 21] gi | 4539482 | emb | CAB39991.1 | (90% 
identity in 201 amino acids) 

SEQ ID NO: 592: -0.534375, 65, novel, similar to hypothetical 
proteins for example ,NinF I Bacteriophage P22 j 

gi I 512350 j emb | Caao5162.1 [ (96% identity in 58 amino acids) 

6640 SEQ ID NO: 593: -1.045273, 202, novel 

SEQ ID NO: 594 : -0.286957, 70, novel, identical to 
hypothetical protein [Bacteriophage VT2"Saj 

gi I 5881625 | dbj | Baa84316.1 | ; similar to Nin E proteins for 
example , [Bacteriophage 21] (100% identity in 57 amino acids) 

6645 SEQ ID NO: 595 : -0.939098, 134, novel, similar to 
hypothetical proteins for example , [Bacteriophage VT2-Sa] 
gi j 5881624 | dbj | Baa84315.1 | (98% identity in 175 amino 
acids); DNA N- 6-adenine-methyltransferase [Bacteriophage Tl] 
(31% identity in 143 amino acids) 

6650 SEQ ID NO: 596: -1.339655, 59, novel, similar to hypothetical 
proteins for example , [Bacteriophage 933 W] 

gi | 4585410 j gb j aaD25438.1 |AF125520#33 (98% identity in 
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148 amino acids); Nin B [Bacteriophage 21] 
gi j 4539479 j emb j CAB39988.1 I (43% identity in 147 amino 
665 5 acids) 

SEQ ID NO: 597 : -0.174286, 176, novel, similar to 
hypothetical proteins for example ,[ Bacteriophage SEQ ID 
NO: 933W] gi j 4585409 j gb j aaD25437. 1 j AF12552Q#32 (99% 
identity in 109 amino acids), GTG start 

6660 SEQ ID NO: 598 : "0.739189, 149, novel, similar to 
hypothetical proteins for example , [Bacteriophage 933W] 
gi | 4499788 I emb | CAB39287.1 I (97% identity in 92 amino acids) 
SEQ ID NO: 599: 0.00851, 142, a Ren protein, similar to Ren 
proteins for example , [Bacteriophage lambda] 

6665 gi | 139473 | sp | P03761 | VREN#LAMBD (97% identity in 96 
amino acids) 

SEQ ID NO: 600: -0.872826, 93, a phage replication protein P. 
similar to phage replication protein Ps for 
example .[Bacteriophage lambda] 

6670 gi | 139488 | sp | P03689 | VRPP#LAMBD(97% identity in 233 
amino acids) 

SEQ ID NO: 601 : -0.0375, 97, a phage replication protein O, 
similar to phage replication protein Os for 
example , [Bacteriophage 933W] 

6675 gi | 4585405 | gb | aaD25433.1 | AF 12552 0#28(99% identity in 312 
amino acids) 

SEQ ID NO: 602 : -0.448927, 234, a regulatory protein Gil, 
similar to regulatory protein CIIs for 

example , [Bacteriophage 933W] 

6680 gi | 4585404 j gb j aaD25432.1 j AF125520#27 (94% identity in 98 
amino acids) 

SEQ ID NO: 603 : -0.815064, 313, a putative regulatory 
protein, similar to putative regulatory proteins for 
example , [Bacteriophage VT2-Sa] gi I 5881616 | dbj | Baa84307.1 j 
6685 (42% identity in 71 amino acids) 

SEQ ID NO: 604 : -0.220408, 99, a putative prophage 
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repressor CI, similar to prophagerepressor CIs for 
example , [Bacteriophage lambda] 

gi j 133353 | sp | P03034 | RPCl#LAMBD (48% identity in 205 
6690 amino acids) 

SEQ ID NO: 605: -0.223611, 73, novel 

SEQ ID NO: 606 : -0.193868, 213, novel (hypothetical 
membrane protein) 

SEQ ID NO: 607 : -0.194624, 94, a putative regulatory 
6695 protein (transcription anti- termination), similar to putative 
transcriptionanti-termination proteins for example , protein N 
[Bacteriophage phi-2l] gi | 132274 | sp I P07243 | R for 
example ,N#BPPH3 (99% identity in 64 amino acids) 
SEQ ID NO: 608: -0.036066, 184, novel 
6700 SEQ ID NO: 609: -0.355556, 91, a putative superinfection 
exclusion protein, similar to superinfection exclusion protein 
B [Bacteriophage P22| gi | 585991 | sp | P38396 | SIEB#BPP22 
(84% identity in 191 amino acids) 

SEQ ID NO: 610: 0.358824, 52, a putative single-stranded 
6705 DNAbinding protein, identical to putative single-stranded 
DNAbinding protein [Bacteriophage 933W3 ; similar to 
E a 10(single- stranded D N A b i n d i n g p r o tein) [ B a c t e r i o p h age 
lambda] gi j 137630 | sp | P03757 | VE 1 0#L AMBD (99% identity in 
122 amino acids) 

6710 SEQ ID NO: 611 : -0.012435, 194, a regulatory protein clli 
(anti termination), identical to regulatory proteincIII 
[Bacteriophage lambda] gi | 133366 | sp i P03044 ! RPC3#LAMBD 
SEQ ID NO: 612: -0.263935, 123, a Kil protein (host killing), 
similar to Kil proteins for example , [Bacteriophage lambda] 

671 5 gi 1 138622 j sp | P03758 j VKIL#LAMBD (97% identity in 89 amino 
acids) 

SEQ ID NO: 613 : -0.544444, 55, a host-nuclease inhibitor 
protein Gam (interrupted), similar to N- terminal part of gam 
[Bacteriophage lambda] ( 99% identity in 37 amino acids) 
6720 SEQ ID NO: 614 : -0.120225, 90, putative host-nuclease 
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inhibitor proteinGam, similar to C-terminal part of Gam 
Bacteriophage lambda] gi j 138128 | sp | P03702 1 VGAM#LAMBD 
99% identity in 98 amino acids), probably disrupted by 

frame shift 

EQ ID NO: 615 : -0.28, 51, a recombination protein Bet, 
identical to Bet proteinlBaeteriophage 933W] 

gi I 4585391 ! gb | aaD25419,l | AF125520#14 ; similar to Bet 
protein [Bacteriophage lambda] 

gi | 137511 | sp | P03698 | VBET#LAMBD (99% identity in 281 
amino acids) 

SEQ ID NO: 616: -0.707143, 99, an exonuclease, identical to 
exo nucleases [Bacteriophage933 W I 

gi | 4585390 | gb | aaD25418.1 | AF125520#13 I similar to 
exonucleases for example , [Bacteriophage lambda] 

gi | 119702 | sp | P03697 | EXO#LAMBD (97% identity in 225 amino 
acids) 

SEQ ID NO: 617 : -0.509195, 262, novel, identical to 
hypothetical protein [Bacteriophage 93 3 W] 

gi [ 4585389 i gb i aaD25417.1 |AF125520#12; similar to 

hypothetical protein orf60a [Bacteriophage lambda] 

gi I 508995 j gb I aaA96568.1 | (95% identity in 62 amino acids) 
SEQ ID NO: 618 : -0.358667, 226, novel, identical to 
hypothetical protein [Bacteriophage 933W] 

gi j 4585388 | gb | aaD25416.1 | AF125520#ll; similar to orf63 
[Bacteriophage lambda] gi | 508994 | gb j aa A96567. 1 I (88% 
identity in 61 amino acids) 

SEQ ID NO: 619 : -0.13871, 63, novel, identical to 
hypothetical proteins for example , [Bacteriophage 933W] 
gi j 4585387 | gb | aaD25415.1 | AF125520#10 ; similar to 
hypothetical protein orf61 [Bacteriophage lambda] (93% 
identity in 46 amino acids) 

SEQ ID NO: 620 : -0.192064, 64, a putative C4-type zinc 
finger protein (TraRfamily), similar to putative C4-type zinc 
finger protein (TraR family) for 
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6755 example ,gi j 7649830 j dbj | Baa94108.1 | (93% identity in 73 
amino acids) 

SEQ ID NO: 621: -0.410753, 94, novel, its N-terminal part is 
similar to hypothetical proteins for example , [Bacteriophage 
933W] gi | 4585455 | gb | aaD25483.1 | AF125520#78 (88% identity 

6760 in 168 amino acids); its Oterminal part is similar to 
hypothetical protein [Bacteriophage HK022] 

gi j 6863138 | gb | aaF30379.1 | AF069308#27 (98% identity in 198 
amino acids), GTG start 
SEQ ID NO: 622: -0.617808, 74, novel 

6765 SEQ ID NO: 623: -0.622222, 316, novel, its N-terminal part 
(amino acids at the position 1-44) is similar to hypothetical 
proteins for example .[Bacteriophage 933W] 

gi | 4 5 8 5 3 8 2 | gb | a a D 2 5 4 10.1 j AF 1 2552 ()# 5 (84% identity in 44 
amino acids) 

6770 SEQ ID NO: 624: -0.068966, 59, novel, partially similar to 
hypothetical proteins for example , [Bacteriophage 93 3 W] 
gi | 4585455 | gb | aaD25483.1 | AF125520#78 (41% identity in 90 
amino acids) 

SEQ ID NO: 625: -0.482204,119, novel 
6775 SEQ ID NO: 626: -0.8125, 121, a putative excisionase, similar 
to putative excisionase s for example .[Bacteriophage 93 3 W] 
gi I 4585379 | gb | aaD25407.1 | AF 12552 0#2 (47% identity in 74 
amino acids) 

SEQ ID NO: 627: -0.72, 81, a putative integrase, similar to 
6780 integrases for example , [Bacteriophage 933W] 

gi I 4585378 ! gb ! aaD25408.1 |AF125520#1 (65% identity in 423 
amino acids) 

SEQ ID NO: 628 : -0.803572, 85, a putative salicylate 
hydroxylase, similar to salicylaf ehydroxylases for 
6785 example , [Streptomyces coelicolorj gi | 7481300 j pir j | T36193 
(31% identity in 348 amino acids) 

SEQ ID NO: 629 : -0.471028, 429, similar to probable 
glu tathione-S- transfer a se, glutathione- S-transf erases for 
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example , [Pseudomonas sp. U2] gi | 340682 9 | gb | aaC2 950 1 , 1 j 

6790 (43% identity in 210 amino acids) 

SEQ ID NO: 1444 : -0.21864, 398, a putative isomerase, 
similar to isomerases for example , isomerase- decarboxylase 
homolog [Pseudomonas sp, U2] 

gi j 3406828 | gb | aaC29500.1 | (46% identity in 188 amino acids); 

6795 similar to hypothetical protein Orf2 [Sphingomonas sp. RW5] 
gi | 3550668 j emb | Caal2268.1 I (54% identity in 228 amino 
acids) 

SEQ ID NO: 1445 : 0.236279, 216, probable gentisate 
1,2-dioxygenase, similar to gentisate 1 ,2-dioxygenases for 
6800 example , [Pseudomonas alcaligenesj 

gi | 5733104 | gb | aaD49427.1 | AF173167#1 (53% identity in 333 
amino acids); [Sphingomonas sp. RW5] 

gi | 3550667 | emb | Caal2267.1 | (45% identity in 339 amino 
acids) 

6805 SEQ ID NO: 1446: -0.183691, 234, a putative transporter 
protein, similar to transporter proteins for 
example ,4-hydroxybenzoate transporter [Pseudomonas putida] 
gi | 6093655 I sp j Q51955 j PCAK#PSEPU (42% identity in 420 
amino acids) 

6810 SEQ ID NO: 1447 : -0.411988, 343, a putative regulatory 
protein, similar to regulatory proteins for example , galactose 
binding protein regulatory element [Azospirillum bra silensel 
gi | 1730232 | sp | P52661 | GBPR#AZOBR (32% identity in 281 
amino acids) 

6815 SEQ ID NO: 1448 : 0.803097, 453, a putative antibiotic 
resistance protein, similar to antibiotic resistance protein 
homolog YwoG [Bacillus subtilis] gi j 7474437 | pir | | B70065 
(38% identity in 381 amino acids) 

SEQ ID NO: 1449: -0.049371, 319, a putative transcription 
6820 regulatory element, similar to putative transcription 
regulatory elements for example ,YvbU [Bacillus subtilis] 
gi | 6648030 j sp j 032255 (32% identity in 266 amino acids) 
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SEQ ID NO: - : 0.973737, 397, novel 

SEQ ID NO: - : 0.093836, 293, a transposase (Orf'A) (insertion 
6825 sequence IS629), hypothetical protein 

gi | 7444868 | pir | | T00241 

SEQ ID NO: 1623, ECs31 23:30780 13-3079083; -0.672472, 357, 
identical to transposase (OrfB) (insertion sequence IS629) 
gi ! 7443862 j pir j | T00240, 

6830 SEQ ID NO: 1653: -0.965741, 109, similar to B2332#ECOLI 
gi | 7466328 | pir | I B65006 (41% identity in 289 amino acids) 
SEQ ID NO: 1654: -0.397973, 297, similar to B2333#ECOLI 
gil788674 (56% identity in 174 amino acids); minor fimbrial 
subunit StfG protein [Salmonella typhimurium I 

6835 dad I AF093503-7 I aaC64157.1 (48% identity in 139 amino acids) 
SEQ ID N 0-1572: -0.075, 281, similar to B2334#ECOLI 
gi | 1788675 (53% identity in 141 amino acids); similar to minor 
fimbrial subunits for example ,StfF I Salmonella typhimurium] 
gi3747033 (53% identityin 158 amino acids) 

6840 SEQ ID N 0-1573: 0.123626, 183, similar to B2335#ECOLI 
gi | 1788676 (47% identity in 166 amino acids); similar to minor 
fimbrial subunit StfE protein [Salmonella typhimurium] 
d a d j A F 093503-51 a a C 64155.1 ( 4 8 % i d e n t i t y in 1 5 4 amino acids ) 
SEQ ID N 0-1574: -0.085256, 157, similar to YFCS#ECOLI 

6845 gi | 1788677 (85% identity in 250 amino acids); periplasmic 
fimbrial chaperone StfD protein [Salmonella typhimurium] 
dad | AF093503-4 | aaC64154.1 (59% identity in 233 amino acids) 
SEQ ID N 0-1575: 0.534337, 167, its N-terminal part (amino 
acids at the position 1-581/883) is similar to YFCU#ECOLI 

6850 gi | 1788679 (90% identity in 577 amino acids), its Oterminal 
part (amino acids at the position 587-883/883) is similar to 
B2337#ECOLI gi I 1788678 (88% identity in 297 amino acids) 
SEQ ID NO: - : -0.305159, 253, similar to B2339#EC0LI 
gi j 1 788680 (88% identity in 187 amino acids); major fimbrial 

685 5 subunit StfA protein [Salmonella typhimurium] 

dad | AF093503-2 |aaC64152.1 (39% identity in 187 amino acids) 
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SEQ ID NO: - : -0.461661, 880, a putative DNA injection 
protein, its N-terminal part is similar to N-terminal part of 
DNA injection protein gp20 [phage P22] 

6860 gi | 1174950 | sp | Q01076 I VG20#BPP22(47% identity in 217 
amino acids); its C-terminal part is similar to(at low level) 
C-terminal part of hypothetical proteins for 

example , [Caenorhabditis el for example ,ans] 
gi j 5805382 j gb j aaD51972.1 !AF173372#1 (34% identity in 76 

6865 amino acids) 

SEQ ID NO: - : -0.20107, 188, a putative DNA transfer 
protein precursor. similar to DNA transfer protein Gp7 
[Bacteriophage P22] gi | 418222 | sp | Q01074 | VG07#BPP22(66% 
identity in 20 7 amino acids) 

6870 SEQ ID NO: 1289 : -0.056085, 379, novel, similar to 
hypothetical protein P31 [Bacteriophage APSE-l] 

gi | 6118026 | gb | aaF03974.1 | AF157835#31 (35% identity in 152 
amino acids): gpl4 [Bacteriophage P22] 

gi | 418225 | sp | Q01075 | VG 14#BPP22(22% identity in 143 amino 

6875 acids) 

SEQ ID NO: 1290: -0.180088,227, novel 

SEQ ID NO: 1291 : -0.107742, 156, a putative replication 
protein, partially similar to replication proteins for 
example , [Haemophilus actinomycetemcomitans plasmid 
6880 pVT736-l] gi | 398106 | gb | aaC37125.1 I (26% identity in 145 
amino acids) 

SEQ ID NO: 1292: 0.176842, 96, novel 
SEQ ID NO: 1293 : -0.803463,232, novel 
SEQ ID NO: 1294: -1.430769, 53, novel 

6885 SEQ ID NO: 1295 : -0.364681, 471, a putative resolvase, 
similar to resolvases for example ,[ plasmid pM3] 
gi i 5668998 | gb | aaD46124.1 | AF078924#3 (46% identity in 204 
amino acids); [Yersinia pestis plasmid pMTl] 

gi ! 7467461 | pir | | T14990 (43% identity in 193 amino acids) 

6890 SEQ ID NO: 1296: -0.218966, 59, a sucrose transporter protein, 
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similar to sucrose transporter protein (permease) [Escherichia 
coli strain EC3132] gi | 231914 | sp | P30000 | CSCB#ECOLI (99% 
identity in 415 amino acids) 

SEQ ID NO: 1297: -0.367308, 209, a putative fructokinase, 
6895 similar to fructokinase (EC 2.7.1.4) for example .[Escherichia 
coli strain EC3132] gi | 730731 | sp | P40713 | SCRK#ECOLI (98% 
identity in 291 amino acids) 

SEQ ID NO: 1298: 0.823615, 416, a sucrose hydrolase, similar 
to sucrose hydrolase [Escherichia coli strain EC3132] 
6900 gi | 3462879 | gb | aaC33123.1 j (98% identity in 477 amino acids) 
SEQ ID NO: 1299: 0.010855, 305, a sucrose operon repressor, 
sucrose operon repressor [Escherichia coli] 

SEQ ID NO: similar to gi | 7292 14 j sp j P4071.5 | CSCR#ECOLI 
(99% identity in 331 amino acids) 
6905 SEQ ID NO: 1300: -0.532914, 478, similar to EryA homologue 
[Bacteriophage If 1 j dad | U02303-9 | aaC62 159.1 (76% identity in 
333 amino acids) 

SEQ ID NO: 1301 : -0.041088, 332, a putative transposase, 
similar to transposase homologA [Helicobacter pylori] 

6910 gi | 2 1 1 44 7 0 | gb | a a D 1 1 5 1 3 . 1 (58% identity in 137 ammo acids) 

SEQ ID NO: 1.618: -0,604712, 383, similar to FLXA#ECOLI 
gi I 2498386 j sp j P77609 (43% identity in 74 amino acids) 
SEQ ID NO: - : -0.437222, 181, a putative polyferredoxin, 
similar to ferredoxin IMethanosarcina thermophila] 

6915 gi j 282643 jpir | | A42960 (48% identity in 43 amino acids); 

similar to polyferredoxin [Methanococcus voltae] 
gi | 99156 | pir | j S24802 (22% identity in 207 amino acids) 
SEQ ID NO: - : -0.478761, 114, a putative anaerobic dimethyl 
sulfoxide reductase chain C, similar to anaerobic dimethyl 

6920 sulfoxide reductase chain Cs for example .[Escherichia coli] 
gi j 118699 | sp | P18777 | DMSC#ECOLI (27% identity in 271 
amino acids) 

SEQ ID NO: 1490: -0.1, 285, a putative anaerobic dimethyl 
sulfoxide reductasechain B, similar to anaerobic dimethyl 
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6925 sulfoxide reductases chain Bs for example , [Escherichia coli] 
gi j 2506394 | sp | P18776 | DMSB#ECOLI (59% identity in 185 
amino acids) 

SEQ ID NO: 1491 : 1.152381, 274, a putative anaerobic 
dimethyl sulfoxide reductase chain A precursor, similar to 
693 0 anaerobic dimethyl sulfoxide reductase chain A precursors for 
example , [Escherichia coli] 

gi | 118697 | sp | P18775 | DMSA#ECOLI (43% identity in 768 
amino acids) 

SEQ ID NO: 1492: -0.325837, 210, novel, similar to DNA 
6935 damage -inducible proteins for example ,DinI [Escherichia coli] 
gi | 2498305 i sp j Q47143 | DINI#ECOLI (43% identity in 81 amino 
acids) 

SEQ ID NO: 1493: -0.412988, 794, novel, similar to(at low 
level) putative Cys3His zinc finger protein AT C T H 
6940 [Arabidopsis thaliana] gi | 18002 79 I gb I a aB 6 80 46. 1 j (37% 
identity in 35 amino acids) 

SEQ ID NO: 1061 : -0.60122, 83, a chaperone-like protein, 
similar to TrcA-like proteins for example ,bfpT-r for 
example ,ulated chaperone-like protein TrcA [ Escherichiacoli 
6945 strain B171-83 gi I 4126789 | dbj | Baa36747.1 | (85% identity in 
195 amino acids) 

SEQ ID NO: 1062 : -0.528302, 54, novel, similar to 
hypothetical proteins for example ,ORF2 [Escherichia coli 
strain B171-8] gi I 4126790 | dbj | Baa36748.1 | (99% identity in 

695 0 216 amino acids) 

SEQ ID NO: 1063 : -0.526531, 197, novel, similar to 
hypothetical protein ORF3 [Escherichia coli strain Bl 71-8] 
gi | 4126791 ! dbj | Baa36749.1 I (98% identity in 352 amino acids) 
SEQ ID NO: 1064 : -0.181019, 217, novel, similar to 

6955 hypothetical proteins for example ,ORF4 [Escherichia coli 
strain B171-8] gi | 4126792 | dbj | Baa36750.1 | (99% identity in 
140 amino acids) 

SEQ ID NO: 1065 : -0.571307, 353, novel, similar to 
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hypothetical protein [Bacteriophage 93 3 W] 

6960 gi j 4585437 j gb j aaD25465.1 I AF125520#60 (93% identity in 129 
amino acids) 

SEQ ID NO: 1066: -0.416429, 141, identical to transposase, 
hypothetical protein [Escherichia coli plasraid p 0"157 
insertion sequence IS629] gi j 7444868 j pir j | T00241I similar to 
6965 hypothetical protein. IS elements for example ,TnpE [Shigella 
flexnerij gi | 5532454 | gb | aaD44738. 1 | AF141323#9 (97% 
identity in 108 amino acids) 

SEQ ID NO: 1067: -0.251938, 130, a transposase, identical to 
transposase [Escherichiacoli plasmid p 0-157 insertion 

6970 sequence IS629] gi | 7443862 I pir j | TOO 240 

SEQ ID NO: 1068: -0,965741, 109, novel, its N-terminal part 
(amino acids at the position 1-87) is partially similar to 
hypothetical proteins for example ,L0015 (amino acids at the 
position 50-136/512) [Escherichia coli 0-157:117 strain EDL933] 

6975 gi | 3414883 ! gb ! aaC31494.1 j 

SEQ ID NO: 1069 : -0.397973, 297, novel, identical to 
hypothetical protein LOO 14 [Escherichia coli 0-157:H7 strain 
EDL933] gi | 3288157 | emb | Caall510.1 | ; similar to hypothetical 
proteins for example ,ORF50 [Escherichia coli] 

6980 gi | 6009426 I dbj i Baa84885.1 | (76% identity in 107 amino acids) 
SEQ ID NO: 1070 : -0.501818, 166, novel, similar to 
hypothetical proteins for example ,L0013 [Escherichia coli 
0-157:H7 strain EDL933] gi | 3414881 | gb | aaC31492.1 | (100% 
identity in 126 amino acids) 

6985 SEQ ID NO: 1071 : 0.010435, 116, a putative endolysin (host 
cell, lysis), similar to N-terminal-half part of endolysins for 
example , [Bacteriophage 933W] 

gi j 4585422 j gb j aaD2 5450.1 j AF125520#45 (93% identity in 73 
amino acids), probably interrupted 

6990 SEQ ID NO: 1072 : -0.403175, 127, novel, similar to 
hypothetical protein YdfR [Escherichia coli] 

gi | 3183262 | sp | P76160 j YDFR#ECOLI (47% identity in 74 
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SEQ ID NO: 1073: -0.144737, 77, a ho! in (host cell lysis), 
6995 similar to holm proteins for example , [Bacteriophage VT2-Sa] 
gi | 5881636 | dbj | Baa84327.1 | (90% identity in 91 amino acids) 
SEQ ID NO: 1074 : -0.027193, 115, novel, similar to 
hypothetical proteins for example .[Bacteriophage 933W] 
gi | 4585419 | gb | aaD25447.1 | AF125520#42 (52% identity in 613 
7000 amino acids) 

SEQ ID NO: 1075: 0.095775, 72, novel 

SEQ ID NO: 1076 : -0.210048, 618, novel, similar to 
hypothetical proteins for example , [Actinobacillus 

actinomycetemcomitans] gi | 7592819 | dbj | Baa94406.1 (29% 

7005 identity in 228 amino acids) 

SEQ ID NO: 1077: 0.446789, 110, anititermination, similar to 
antitermination proteins for example , protein Q 

[Bacteriophage lambda! gi | 132278 I sp | P03047 | R for 
example ,Q#LAMBD (97% identity in 207 amino acids) 

7010 SEQ ID NO: 1078: 0.628745, 248, a serine/threonine protein 
phosphatase, similar to serine/threonine proteinphosphatases 
for example , [Bacteriophage lambda ! 

gi | 130792 | sp | P03772 | PP#LAMBD (95% identity in 221 amino 
acids) 

7015 SEQ ID NO: 1079 : -0.263768, 208, novel, similar to 
hypothetical proteins for example ,NinG [Bacteriophage 21] 
gi | 4539482 | emb | CAB39991.1 | (89% identity in 199 amino 
acids) 

SEQ ID NO: 1080: -0.243891, 222, novel, similar to phage 
7020 hypothetical proteins for example , [Bacteriophage 

phi-Ye03-12] gi | 6598993 | emb | CAB63597.1 | (32% identity in 
110 amino acids) 

SEQ ID NO: 1081 : -1.078325, 204, a putative transposase, 
similar to N-terminal part of transposases for 
7025 example .[Escherichia coli insertion sequence IS30] 
gi | 2851554 | sp | P37246 j TRA8#ECOLI (100% identity in 247 
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amino acids) 








SEQ ID NO 


1082 : 


-0.772872, 189, 


novel, 


TTG start 


SEQ ID NO 


1083: 


-0.849402, 252, 


novel 




SEQ ID NO 


1084: 


-0.28168, 132, 


novel 




SEQ ID NO 


1085: 


-1.133413, 423, 


n o v e 1 




SEQ ID NO 


: 1086: 


-0.535766, 138, 


novo 


, its O terminal 



is similar to ctp synthase - Rickettsia prowasekii 
gi | 7438005 j pir j IC71695 (24% identity in 138 amino acids); its 
7035 N-terminal part is similar to hypothetical protein 

Plasmodium falciparum gi | 4493974 j emb ! CAB39033.1 1 (24% 

identity in 12 9 amino acids) 

SEQ ID NO: 1087: -0.442424, 133, novel 

SEQ ID NO: 1088 : -0.501657, 544, a putative integrase, 
7040 similar to site specific recombinases for 
example ..integraserecombinase protein [Methanobacterium 
thermoautotrophicum] gi | 7428936 | pir | ID69219 (27% identity 
in 174 amino acids) 

SEQ ID NO: 1089 : -0.314416, 438, novel (DNAbinding 

7045 protein), similar to putative DNA-binding protein 

[Bacteriophage P4] gi | 140147 | sp ! P12552 | Y9K#BPP4 (42% 
identity in 50 amino acids); similar to hypothetical proteins 
for example , [Yersinia pestisl gi | 7467337 | pir | | T17447 (46% 
identity in 40 amino acids) 

7050 SEQ ID NO: 1090 : -0.426185,402, novel 

SEQ ID NO: 1091 : -0.441176, 69, a putative regulatory 
element, similar to regulatory proteins for example ,MocR 
[Sinorhizobium meliloti] gi | 1346565 I sp | P49309 (34% identity 
in 466 amino acids) 

7055 SEQ ID NO: 1092: -0.333569, 284, novel, similar to conserved 
hypothetical protein [Streptomyces coelicolor A3(2)j 

gi | 7649565 ! emb j CAB89054.1 (38% identity in 141 amino acids) 
SEQ ID NO: 1597 : -0.168469, 445, novel, similar to 
N-terminal part of hypothetical proteins for example , VdcD 

7060 [Streptomyces sp. D7| gi | 4741 970 | gb | aaD28783. 1 I AF1 34589#3 
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(57% identity in 71 amino acids); YclD [Bacillus subtilis] 
gi | 7452267 | pir | I A69762 (48% identity in 68 amino acids) 
SEQ ID NO: 1598 : -0.074126, 144, a putative 
4-hydroxybenzoate decarboxylase, identical to YclC 

7065 [Escherichia coli 0-157:H7 strain?] 

gi | 4887556 | emb | CAB43499.1 | (100% identity in 475 amino 
acids); similar to VdcC [Streptomyces sp. D7j 
gi | 6686069 | sp | Q9X697 I VDCC#STRD7 (72% identity in 474 
amino acids); 4-hydroxybenzoate decarboxylase [Clostridium 

707 0 h y d r o x y b e n z o i c u m ] 

gi | 5739200 | gb | aaD50377.1 j AF128880#l(53% identity in 469 
amino acids) 

SEQ ID NO: 1541 : -0.65, 79, a putative phenylacrylic acid 
decarboxylase, identical to Pad! [Escherichia coli 0-.1.57:H7 

7075 strain ?] gi | 4887557 | emb | CAB43500.1 | ; similar to 
phenylacrylic acid decarboxylases for example ,VdcB 
[Streptomyces sp. D7j (73% identity in 1.90 amino acids) 
SEQ ID NO: 1542: -0.214105, 476, a transcription regulatory 
element, identical to SlyA [Escherichia coli 0"157:H7 strain ?] 

7080 to gi I 4887558 I emb I CAB43501.1 j ; similar to transcription 
regulatory elements for example , [Streptomyces coelicolor j 
gi I 7481485 j pir j | T35022 (32% identity in 124 amino acids) 
SEQ ID NO: 1543 : 0.027919, 198, novel, similar to 
hypothetical proteins for example , [Escherichia coli] 

7085 gi I 7404494 | sp | P45956 | YGBF#ECOLI (86% identity in 94 
amino acids) 

SEQ ID NO: 1544 : -0.374074, 136, novel, similar to 
hypothetical protein b 2 75 5 [Escherichia coli strain K- 12] 
gi j 7460139 j pir | | G65056 (84% identity in 303 amino acids), 
7090 GTG start 

SEQ ID NO: 1330 : 0.025773, 98, novel, similar to(at low 
level) hypothetical protein b2756 [Escherichia coli strain 
K-12] gi I 6136707 | sp j Q46897 | YGCH#ECOLI (28% identity in 
200 amino acids) 
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7095 SEQ ID NO: 1331 : -0.038111, 308, novel, similar to 
hypothetical protein b2 75 7 [Escherichia coli strain K-12] 
gi | 7459357 | pir | I A65057 (35% identity in 160 amino acids) 
SEQ ID NO: 1332 : -0,411111, 217, novel, similar to 
hypothetical protein b2 758 [Escherichia coli strain K-12] 

7100 gi 1 7476186 | pir | I C70849 (32% identity in 93 amino acids) 
[0022] 

5) Regulatory element 

Sequence juitoboi hyd) f-phobu ity. The n u to Un of amjno 
acids, Character such as function 

7105 SEQ ID NO: 1333: -0.537097,249, novel 

SEQ ID NO: 1334 : -0.248718, 352, novel, similar to 
hypothetical protein b2 760 [Escherichia coli strain K-12] 
gi | 7451979 | pir | I D65057 (24% identity in 303 amino acids) 
SEQ ID NO: 1335 : -0.612921, 179, novel, similar to 

7110 hypothetical protein YgcB [Escherichia coli strain K-12] 
gi | 2506493 | sp | P38036 | YGCB#ECOLI (28% identity in 778 
amino acids), GTG start 

SEQ ID NO: 1336: -0.429615, 521, similar to YBDY#ECOLI 
gi | 3025009 I sp | P77091 (78% identity in 50 ammo acids); 
7115 similar to SrnB [ plasmid F] dad | AP001918-5 | Baa97875.1 
(42% identity in 49 amino acids) 

SEQ ID NO: 1337 : -0.257627, 886, novel, similar to 
hypothetical proteins for example ,Tp70 [Treponema 
pallid um.! gi | 7521576 | pir | | A71309 (35% identity in 124 amino 
7120 acids) 

SEQ ID NO: - : 0.81, 51, novel, similar to N-terminal part of 
h y pot h e t i c a 1 p r oteins for example , Y gcG [Es c h e r i c h i a c o 1 i I 
gi | 172381 7 ! sp j P55140 | YGCG#ECOLI(43% identity in 186 
amino acids) 

7125 SEQ ID NO: 1512 : -0.608397,132, novel 

SEQ ID NO: 1513: 0.301786, 225, novel, its N-terminal part 
is similar to N-terminal part of hypothetical proteins for 
example ,YgcG [Escherichia coli] 



Appendix B: Hideo et at. Full Translation 

gi ! 1723817 | sp | P55140 I YGCG#ECOLI(31% identity in 147 

7130 amino acids) 

SEQ il) NO' 1514 : 0.238, 51, similar to YGCG#ECOLI 
gi [ 1789140 (40% identity in 275 amino acids); similar to 
hypothetical protein [Pseudomonas aeruginosa] 

dad | AE004490-5 | aaG03925.1 (43% identity in 273 amino acids), 

7135 GTG start 

SEQ ID NO: 1515: 0.225393, 383, a lipoprotein precursor (type 
III secretion system), similar to type III secretion system 
lipoprotein precursors for example ,PrgK protein [Salmonella 
typhimurium] gi | 1172615 j sp j P41786 i P R G K # S A L T Y (53% 

7140 identity in 231 amino acids) 

SEQ ID NO: - : 0.151648, 274, a type III secretion protein, 
similar to Mxil [Shigella f lexneri] 

gi | 547954 | sp | Q06080 | MXII#SHIFL (32% identity in 93 amino 
acids);PrgJ protein [Salmonella typhimurium] 

7145 gi | 1172614 | sp | P41.785 | PRGJ#SALT Y (31% identity in 87 
amino acids) 

SEQ ID NO: 1192: 0.037705, 245, a type III secretion protein, 
similar to putative typelll secretion proteins for 
example ,PrgI protein [Salmonella 

7150 typhimurium] gi i 11 72613 | sp | P41784 j PRGI# SALTY (64% 
identity in 76 amino acids) 

SEQ ID NO: 1193: -0.282727, 111, a putative adherence factor, 
similar to a part of adherence factors for example , Efal 
[Escherichia coli O lll :H- strain E45035] 

7155 gi j 6013469 I gb | aaD49229.2 | AF 1 59462#1 (amino acids at the 
position 433-711/3223) (100% identity in 279 amino acids), 
probably disrupted by frame shift 

SEQ ID NO: 1194: -0.588608, 80, a transposase, identical to 
transposase [Escherichia coli plasmid p 0-157 IS 62 9] 
7160 gi | 7443862 SpirS I T00240 

SEQ ID NO: 1195: -0.379918, 245, a transposase, identical to 
hypothetical protein [Escherichia coli plasmid p 0-157 
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IS629] gi | 7444868 | pir j| T00241; similar to hypothetical 
protein, insertion sequences for example , [Shigella flexneri] 
7165 gi j 5532454 | gb j aaD44738.1 |AF141323#9 (98% identity in 108 
amino acids) 

SEQ ID NO: 1196: -0.045181, 167, novel, GTG start 
SEQ ID NO: 1197 : -0.081233, 374, novel, similar to 
hypothetical proteins for example , LOO 14 [Escherichia coli 
7170 0-157:H7 strain EDL933] gi | 3414882 | gb | aaC31493.1 j (99% 
identity in 115 amino acids) 

SEQ ID NO: 1198 : 1.038462, 79, novel, similar to 
hypothetical proteins for example ,L0015 [Escherichia coli 
0-157:H7 strain EDL933I gi | 3414883 | gb | aaC3 1494.1 | (100% 

7175 identity in 411 amino acids} 

SEQ ID NO: 1199: 0.805162, 151, novel, similar to a part of 
hypothetical proteins for example , LOO 13 [Escherichia coli 
0-157:H7 strain EDL933] gi I 3414881 | gb | aaC3 1492.1 ! (55% 
identity in 28 amino acids), GTG start, probably disrupted 

7180 SEQ ID NO: 1200 : 0.976744, 87, novel, similar to 
hypothetical proteins for example .ORF50 [Escherichia coli 
plasmid pB17l] gi | 6009426 I dbj | Baa84885,l | (70% identity in 
106 amino acids) 

SEQ ID NO: 1201 : 0.748416, 222, novel, similar to 
7185 hypothetical proteins for example , LOO 15 [Escherichia coli 
0-157:H7 strain EDL933I gi | 3414883 ! gb | aaC31494.1 | (63% 
identity in 464 amino acids) 

SEQ ID NO: 1202: -0.236585, 329, novel, similar to a part of 
transposases for example ,TnpA [Shigella flexneri] 

7190 gi | 5532449 | gb | aaD44733.1 | AF141323#4 (93% identity in 49 
amino acids) 

SEQ ID NO: 1203 : -1.506341, 208, novel, similar to 
hypothetical proteins for example .L0004 [Escherichia coli 
0-157:H7 strain EDL933] gi | 3414872 | gb | aaC31483.1 | (98% 
7195 identity in 91 amino acids); putative transposase [Vibrio 
eholerae] gi | 7960026 | gb | aaF71186.1 | AF179596#6 (59% 
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identity in 91 amino acids); hypothetical protein [Escherichia 
coli plasmid p 0-157 insertion sequence IS911] 

gi | 7465897 | pir | | T00224 (52% identity in 91 amino acids) 
7200 SEQ ID NO: 1204: -0.892208, 78, a putative transcription 
regulatory element, similar to regulatory elements (RpiR 
family) for example , [Bacillus subtilis] 

gi I 8248807 | emb j CAB93068.1 I (25% identity in 236 amino 
acids) 

7205 SEQ ID NO: 1205 : -1.002703, 112, a putative 
ferri chrome -binding protein, similar to ferrichrome -binding 
proteins for example , [Bacillus subtilis] 

gi | 585132 | sp | P37580 j FHUD#BACSU (27% identity in 220 
amino acids) 

7210 SEQ ID NO: 1206 : -0.212558, 440, a putative ferrichrome 
ABC transporter (permease), similar to ferrichrome ABC 
transporters (permease) for example , [Bacillus subtilis] 
gi | 1706797 | sp | P49937 | FHUG#BACSU (33% identity in 319 
amino acids) 

7215 SEQ ID NO: 1207: 0.465452, 687, a putative ferrichrome ABC 
transporter (permease), similar to ferrichrome ABC 
transporters (permease) for example , [.Synechocystis sp.] 
gi | 7442493 | pir | | S74438 (43% identity in 315 amino acids); 
[Bacillus subtilis] gi | 1706795 | sp | P49936 I FHUB#BACSU (39% 

7220 identity in 319 amino acids) 

SEQ ID NO: 1208 : -0.209449, 382, a putative ABC-type 
iron-siderophore transport system ATP-binding protein, similar 
to ABC-type iron-siderophore transport system ATP-binding 
proteins for example , [Synechocystis sp.] 

7225 gi 1 7442509 \ pir | | S74440 (52% identity in 248 amino acids) 

SEQ ID NO: 1209: -0.149383, 568, a putative ferrichrome-iron 
receptor precursor, similar to ferrichrome-ironreeeptor 
precursors for example ,gi j 7448497 | pir | | S 74457 (30% 
identity in 688 amino acids) 

7230 SEQ ID NO: 1210: 0.036546, 250, novel, TTG start 
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SEQ ID NO: 1211 : 1,166101, 60, a PTSdepenclent 
N-acetyl-galactosamine-IID component (AgaE), similar to 
P T S d e p e n dent N - a c e t y 1 - g a 1 a c t o s a m ine-IID comp o n e n t , A g a E 
[Escherichia coli strain C] 

7235 gi | 8895749 | gb | aaF81085.1 | AF228498#5 (96% identity in 292 
amino acids) 

SEQ ID NO: - : -0.257895, 77, a PTS dependent 
N-acetyl-galactosamine-and galactosamine IIA component 
(AfaF), similar to ts dependent N-acetyl-galactosamine-and 
7240 galactosamine IIA component, AgaF [Escherichia coli strain Cj 
gi | 8895750 i gb i aaF8 1086.1 | AF228498#6 (99% identity in 144 
amino acids) 

SEQ ID NO: 1527 : 0.06993, 144, a transposase (insertion 
sequence IS629), identical to hypothetical protein 
7245 gi | 7444868 | pir | | T00241 

SEQ ID NO: 1528 : 1.167709, 193, identical to transposase 
(insertion sequence 18629), gi I 7443862 | pir j | T00240 
SEQ ID NO: 1529: 0.38766,236, novel 

SEQ ID NO: 1530: -0.008, 226, a leader peptidase, similar to 
7250 leader peptidases for example ,HopD (strain ECOR30) 
[Escherichia coli] gi | 7674073 | sp | 068932 (92% identity in 155 
a m i n o a c i d s ) : (LT2) [S a 1 m o n e 1 1 a t y p h i m uriu m ] 

gi | 7674072 j sp j 068927 (68% identity in 148 amino acids) 
SEQ ID NO: 1531 : -0.168, 226, novel, similar to hypothetical 
7255 protein [Xyle llafa stidiosa] 

gi j 9112262 | gb j aaF85593.1 j AE003851#24 (50% identity in 86 
amino acids) 

SEQ ID NO: - : -0.265401, 238, a putative invasin, similar to 
putative membrane protein hi 978 [Escherichia coli K- 12] 
7260 gi j 1736642 | dbj | Baal5799.1 | (45% identity in 1391 amino 
acids): vasin [Yersinia pseudotuberculosis] 

gi | 79202 | pir | | A29646 (35% identity in 1211 amino acids) 
[002 3] 

6) Proteins relating to fimbriae 
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- < { 1 1 i ! 1 ! L\ d i * j ' 5 id 

Character such as function 

SEQ ID NO: 1674 : -0.352675, 244, similar to replication 
protein 0, for example , protein O [Enterobacteria phage 
HK022] gi | 407289 | gb | a aB 602 72,1. | (98% identity in 299 amino 
acids) 

SEQ ID NO: 1129 : -0.391449, 422, a replication protein P 
(putative replication DNAhelicase), similar to P proteins 
for example , I Enterobacteria phage HK022] 

gi | 6863143 | gb | aaF30384.1 | AF069308#32 (99% identity in 478 
amino acids); replication DNA helicases for example ,DnaB 
[Escherichia coli] gi | 118713 j sp | P03005 ! DNAB#ECOLI (39% 
identity in 436 amino acids) 

SEQ ID NO: 1130 : -0.275728, 207, novel, identical to 
hypothetical protein [Bacteriophage VT2-Sa] 

gi | 5881620 | dbj j Baa 843 11.1 | (100% identity in 89 amino acids) 
SEQ ID NO: 1.131 : -0.090099, 102, novel, identical to 
hypothetical protein [Bacteriophage 93 3 W] 

gi | 4499788 i emh ] CAB39287.1 | (100% identity in 92 amino 
acids) 

SEQ ID NO: 1132: -0.513839, 225, a type III secretion protein, 
similar to PrgH protein [Salmonella Ivy phi murium] 
gi | 1172612 | sp | P41783 I PRGH#SALTY (28% identity in 266 
amino acids); MxiG [Shigella flexneri] 

gi | 2498603 | sp j Q57332 j MXIG#SHIFL (23% identity in 243 
amino acids) 

SEQ ID NO: 1133 : -0.08, 116, a putative transcription 
regulatory element, similar to transcription activator 
N t r C [ H e r b a s p i r iliu m sero p edic a e ] 

gi ! 57313501 | gb | aaC32391.21 (25% identity in 107 amino acids) 
SEQ ID NO: 1134: -0.503734, 483, a type III secretion protein, 
similar to type Illsecretion proteins for example ,SpaS protein 
[Salmonella typhimurium] gi | 730801 1 sp | P40702 | SPAS#SALTY 
(54% identity in 348 amino acids) 
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SEQ ID NO: 1135 : -0,293631,315, novel, 

7300 SEQ ID NO: 1136: -0.452748, 183, ABC transporter (binding 
protein), similar to binding proteins for 
example , phosphate-binding protein PstS homolog 
I Methanobaeterium thermoautotrophicum (strain Delta H)] 
gi I 7442891 | pir | | A69098 (32% identity in 187 amino acids) 

7305 SEQ ID NO: 1137 : 0.39434, 54, its N-terminal part (amino 
acids at the position 1-77/505) is similar to 
YZGL#ECOLIgi | 1789834 (83% identity in 77 amino acids); its 
O terminal part (amino acids at the position 325-519/525) is 
similar to binding proteins for example .phosphate-binding 

7310 protein PstS homolog [Methanobacterium thermoautotrophicum 
strain Delta H] gi | 7442891 | pir | i A69098 (31% identity in 175 
amino acids) 

SEQ ID NO: 1138: 0.390909, 67, a putative DNA processing 
chain A, similar to many DNA processing chain As (Smf 
7315 protein), for example , [Neisseria meningitidis] 

gi | 7378929 | emb | CAB83472.1 | (30% identity in 265 amino 
acids) 

SEQ ID NO: 1139: -0.774999, 297, a putative ATP-dependent 
DNA helicase (partial), similar to Oterminal part of 
732 0 ATP- dependent DNA helicase [Streptomyces coelicolor] 
gi | 7480492 ! pir ! |T35189(64% identity in 37 amino acids), GTG 
start 

SEQ ID NO: 1140: -0.122667, 76, a putative ATP- dependent 
DNA helicase (partial), similar to a part of ATP-dependent 
7325 DNA helicase [Streptomyces coelicolor] 

gi I 7480492 | pir M T35189 (31% identity in 269 amino acids), 
GTG start 

SEQ ID NO: 1141 : -0.286338, 550, a putative ATP- dependent 
DNA helicase (partial), similar to a part of putative 
7330 ATP-dependent DNA helicase [Streptomyces coelicolor] 
gi ! 7480492 | pir | | T35189 (48% identity in 175 amino acids) 
SEQ ID NO: 1142: -0.02069, 59, a putative ATP-dependent 
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DNA helicase (interrupted), similar to N-terminal part of 
putative ATP- dependent DNA helicases for 

7335 example , [Streptomyces coelicolor] gi | 7428315 j pir j j T35189 
(60% identity in 176 amino acids); [Bacillus subtilis] 
gi j 7436435 j pir | | F69901 (42% identity in 169 amino acids) 
SEQ ID NO: 1143: -0.395745, 330, novel 

SEQ ID NO: 1144 : -0.477678, 225, novel (hypothetical 
7340 membrane protein) 

SEQ ID NO: 1145 : -0.43168, 263, novel (hypothetical 
m e m b r a n e p r o t e i n ) 

SEQ ID NO: 1146 : -0.74642, 434, novel, similar to 
hypothetical protein ORF79 [ Escherichia coli plasmid 
7345 pB17l] gi | 6009455 | dbj | Baa84914.1 (62% identity in 175 amino 
acids) 

SEQ ID NO: 1147 : -0.610909, 276, novel, similar to 
hypothetical protein ORF80 [Escherichia coli plasmid 
pB17l] (70% identity in 86 amino acids) 
7350 SEQ ID NO: 1148 : -0.397973, 297, novel (hypothetical 
lipoprotein) 

SEQ ID NO: 1149 : -0.965741, 109, a putative 

O-methyltransferase, similar to a part of O- methyl transferases 
f o r e x a m p 1 e , a c e t y 1 s e r o t o n i n N - rn e t h y ltr a n sf erase (EC 2.1.1. 4 ) - 
7355 chicken gi | 2498445 | sp | Q92056 j HIOM#CHICK (28% identity in 
157 amino acids) 

SEQ ID NO: 1150: -0,836842, 39, novel 

SEQ ID NO: 1151 : 0.029565, 116, a putative acyltransferase, 
similar to acyltransf erases for example .[Neisseria meningitidis 
7360 MC58] gi I 7226953 | gb j aaF42046.1 I (33% identity in 246 amino 
acids) 

SEQ ID NO: 1152: -0.409503, 464, a putative acyl carrier 
protein, similar to acyl carrier proteins for 
example , [Neisseria meningitidis MC58] 

7365 gi ! 7226952 ! gb ! aaF42045.1 | (51% identity in 85 amino acids) 

SEQ ID NO: 1153 : -0.178846, 53, a putative acyl carrier 
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protein, similar to acyl carrier proteins for 
example , [Neisseria meningitidis MC58] 

gi | 7226951 | gb | aaF42044.1 I (51% identity in 79 amino acids) 
7370 SEQ ID NO: 1154 : -0.063793, 117, novel (hypothetical 
membrane protein), similar to putative integral membrane 
protein [Neisseria meningitidis] gi ! 7380586 | emb | CAB85 174.1 j 
(51% identity in 126 amino acids) 

SEQ ID NO: 1155: -0.55546, 468, novel, similar to peptide 
7375 synthetase [sic, synthase! [Xylella fastidiosa] 

gi j 9105980 | gb | aaF83848. 1 | AE003941#2 (26% identity in 420 
amino acids) ;p-coumaryl-CoA ligase [Rhodobacter sphaeroides] 
gi | 2764724 | emb | Caa05380.1 | a part of (27% identity in 268 
amino acids): a part of surfactin synthetase component I 
7380 [Bacillus subtilis] gi I 2 127235 j pir j 1 140485 (20% identity in 
410 amino acids) 

SEQ ID NO: 1156 : -0.569643, 57, a putative 

(3R)-hydroxymyristoyl- [acyl carrier protein] dehydratase, 
similar to(at low level) a part of (3R)-hydroxyrayristoyl- [acyl 
7385 carrier protein] dehydratases for example .[Salmonella 
typhimurium) gi j 140182 | sp | P2 17 73 j FABZ#SALTY (29% 
identity in 67 amino acids) 

SEQ ID NO: - : -0.908772, 115, novel, its N-terminal part is 
similar to dolichyl-phosphate mannose synthase related 

7 3 S> 0 proteins for example . [ Pyrococcus abyssi (strain Orsay)] 
gi | 7445533 | pir | | A75176 (30% identity in 206 amino acids); its 
N-terminal part is similar to HmsR [Yersinia pestis] 
gi | 1185391 | gb | aaB66590.1 I (34% identity in 128 amino acids); 
its Oterminal part is similar to hypothetical protein [Xylella 

7395 fastidiosa] gi | 9105669 | gb | aaF83585.1 | AE003918#7 (30% 
identity in 310 amino acids) 

SEQ ID NO: 1402 : 0.001017, 296, novel, similar to 
hypothetical proteins for example ,. [Deinococcus 

radiodurans] gi | 7471367 j pir | j B75463 (31% identity in 111 
7400 amino acids), GTG start 
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SEQ ID NO: 1403: "0.013016, 316, novel 

SEQ ID NO: 1404: 1.044986, 350, novel, similar to membrane 
protein [Xylella 
fastidiosalgi I 9105671 | gb | aaF83587.1 | AE003918#9 (24% 
7405 identity in 502 amino acids) 

SEQ ID NO: 1405: 1.132416, 328, novel 

SEQ ID NO: 1406 : -0.004833, 270, putative 

3 -oxoacyl-(acyl- carrier- protein) synthase II, similar to 
3 -oxoacyl-(acyl- carrier- protein) synthase lis for 

7410 example , [Streptomyces coelicolor A3(2)] 

gi | 7479090 | pir | IT34912 (31% identity in 381 amino acids) 
SEQ ID NO: 1407 : -0.402244, 714, a putative 
beta-hydroxydecanoyl-ACP dehydrase, similar to hypothetical 
protein [Neisseria meningitidis MC58] 

7415 gi | 7226956 | gb | aaF42049.1 | (32% identity in 116 amino acids); 
beta-hydroxydecanoyl-ACP dehydrase [Pseudomonas 

aeruginosa] gi | 2384563 | gb | aaC45619.1 | (29% identity in 123 
amino acids) 

SEQ ID NO: - : -0.405385, 131, a putative 

7420 3 -oxoacyl-(acyl- carrier- protein) reductase, similar to 
3 -oxoacyl-(acyl- carrier- protein) reductases for 
e x a m p le ,[ N e i sse r i a ra e n i n g i t i d i s M C 5 8 ] 

gi | 7226957 I gb i aaF42()50.1 | (57% identity in 242 amino acids) 
SEQ ID NO: 1585 : 0.50548, 293, similar to putative 
7425 3-oxoacyl-(acyl-carrier- protein) synthase lis for 
example ,gi | 7226958 | gb | aaF42051.1 | (48% identity in 404 
amino acids) 

SEQ ID NO: 1586: 0.152083, 145, a putative transcription 
regulatory element, similar to transcription regulatory 
7430 elements for example .[Escherichia coli] 

gi j 129347 | sp | P13669 | FARR#ECOLI (28% identity in 235 
amino acids) 

SEQ ID NO: 1656 : -0.965741, 109, a putative PTS 
(phosphotransferase system) system enzyme IIA, similar to PTS 



Appendix B: Hideo et at. Full Translation 



system enzyme IIA components for example , [Escherichia coli 
K-12] gi | 2507274 | sp | P37187 | PTKA#ECOLI (23% identity in 
122 amino acids); PTSsystem frucrose- specific enzyme II BC 
component [Bacillus halodurans] gi j 4512375 j dbj j Baa75339. 1 I 
(33% identity in 151 amino acids) 

SEQ ID NO: 1657: -0.397973, 297, a putative PTS system 
enzyme IIB, similar to PTS system, g a 1 a c t i t o 1 - s p e cifi c 1 1 B 
component [Escherichia coli K-12] 

gi | 2507273 | sp | P37188 | PTKB#ECOLI (35% identity in 92 
amino acids) 

SEQ ID NO: - : 0.072131, 62, a putative PTS system enzyme 
IIC, similar to PTS system galactitol-speeific enzyme IICs for 
example , [Bacillus halodurans] gi j 4512376 | dbj | Baa75340.1 | 
(45% identity in 411 amino acids) 

SEQ ID NO: 1695: 0.74129, 156, a putative sugar kinase, 
similar to sugar kinases for example ,xylulokinase (EC 
2.7.1.17) [Lactobacillus pentosus] 

gi | 139850 | sp | P21939 | XYLB#LACPE (23% identity in 496 
amino acids) 

SEQ ID NO: 1678: -0.385107, 95. a putative PTS system HPr 
enzyme, similar to phosphotransferase system HPr enzymes 
for example , [Xylella fastidiosa] 

gi I 9106413 | gb | aaF84212.1 | AE003971#11 (39% identity in 87 
amino acids) 

SEQ ID NO: 1679: 0.150932, 162, a putative aldolase, similar 
to aldolases for example , [Vibriofurnissii] 

gi | 1732204 | gb j aaC44684.1 j (38% identity in 272 amino acids) 
SEQ ID NO: - : 0.763317, 200, novel, similar to HicB-related 
protein [Xylella fastidiosa] 

gi j 9106728 | gb | aaF84477.1 | AE003992#13 (35% identity in 110 
amino acids); HicB [Haemophilus influenzae] 

gi | 3603326 | gb | aaC35810.1 I (26% identity in 93 amino acids) 
SEQ ID NO: 1548: -0.459394, 331, novel, similar to Hie A 
[Haemophilus influenzae] gi | 3603325 j gb | aaC35809.1 | (30% 
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identity in 60 amino acids) 
7470 [0024] 

7) Proteins relating to transportation of substance 
H_ea.ue„nce„..j^ 

i \: .i y;s i- r i-r - :u- h a - 1 1. :sf t i u i 

SEQ ID NO: - : 0.123763, 506, a type III secretion protein, 
7475 similar to C-terminal part of type III secretion proteins for 
example ,SpaE. protein [Salmonella typhimurium] 

gi I 730799 1 sp I P40701 j SPAR#SALTY(56% identity in 65 amino 
acids), may be partial (disrupted by frameshift) 
SEQ ID NO: 1521 : -0.08725, 401, novel, similar to 
7480 hypothetical protein [Xylella fastidiosaj 

gi | 9112263 | gb ! aaF85594.1 I AE003851#25 (48% identity in 158 
amino acids) 

SEQ ID NO: 1522 : 0.754902,52, novel 

SEQ ID NO: 1523: -0.310185, 325, heme utilization/transporter 
7485 protein, identical to ChuA [Escherichia coli (>157:H7 EDL933] 
gi | 1763009 i gb | aaC44857.1 | 

SEQ ID NO: 1524: 0.080682, 177, novel, TTG start 
SEQ ID NO: 1525: -0.081683, 203, a putative hemin-binding 
protein, similar to hypothetical protein huT [Shigella 
7490 dysenteriae haem transport locus] gi | 2967538 | gb | aaC27815.1 | 
(97% identity in 304 amino acids); hemin-binding proteins for 
example , [Yersinia pestis] 

gi | 6226635 | sp | Q56991 | HMUT#YERPE (34% identity in 253 
amino acids) 

7495 SEQ ID NO: 1613 : -0.262046, 304, a putative 
coproporphyrinogen oxidase, similar to coproporphyrinogen 
oxidases for example ,PhuW [Vibrio parahaemolyticus 
gi j 5106980 | gb | aaD39908.1 | AF119047#1 (35% identity in 371 
amino acids) 

7500 SEQ ID NO: 1614 : 0.671015, 139, novel, similar to 
hypothetical proteinhuX [Shigella dysenteriae haem transport 
locus] gi ! 2967537 ! gb i aaC27814.1 I (98% identity in 164 amino 
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acids); hypothetical protein X [Yersinia pestis] 

gi j 7467368 | pir j |T12066 (60% identity in 153 amino acids) 

7505 SEQ ID NO: 1659 : -0.222178, 249, novel, similar to 
hypothetical proteinhuY [Shigella dysenteriae haem transport 
locus] gi 1 2967536 | gb | aaC27813.1 | (97% identity in 207 amino 
acids); hypothetical protein Y [Yersinia pestis] 

gi | 7467369 | pir | | T12067 (55% identity in 204 amino acids) 

7510 SEQ ID NO: - : -0.069143, 176, a putative hemin permease, 
similar to hypothetical proteinhuU [Shigella dysenteriae haem 
transport locus] gi | 2967535 | gb | aaC27812.1 | (99% identity in 
318 amino acids); hemin permeases for example ,HmuU 
[Yersinia pestis] gi | 6226636 | sp | Q56992 | HMUU#YERPE (66% 

7515 identity in 318 amino acids) 

SEQ ID NO: 1671 : -0.626137, 89, a putative hemin transport 
system ATP-binding protein, similar to hypothetical 
proteinhuV [Shigella dysenteriae haem transport locus] 
gi | 2967534 | gb ! aaC27811.1 | (98% identity in 256 amino acids); 

75 20 hemin transport system AT P-binding proteins for 
example ,HmuV [Yersinia pestis] 

gi | 2492539 I sp j Q56993 j HMUV#YERPE( 58% identity in 264 
amino acids) 

SEQ ID NO: 1241 : -0.4456, 126, a putative fimbria! protein 
7525 precursor, similar to fimbrial proteins for example ,long polar 
fimbrial minor protein precursor [Salmonellatyphimurium] 
gi ! 11 70819 | sp | P43664 j LP FE# SALTY (50% identity in 165 
amino acids) 

SEQ ID NO: 1242 : 0.022946, 354, a putative fimbrial 
7530 protein precursor, similar to fimbrial proteins for 
example Jong polar fimbrial protein LpfD [Salmonella 
typhimurium] gi | 1170818 | sp ! P43663 | LPFD#SALTY (39% 
identity in 350 amino acids) 

SEQ ID NO: 1243 : -0.201546, 195, a putative outer 
7535 membrane usher proteinLpfC precursor (partial), similar to C 
-terminal-half part of outer membrane usher proteins for 
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example ,LpfC precursor [Salmonella typhimurium] 
gi ! 11 70817 | sp I P43662 j LPFC#SALTY(67% identity in 485 
amino acids), GTG start 

7540 SEQ ID NO: 1244: 0.154275, 270, a putative outer membrane 
usher protein, similar to N-terminal-half part of outer 
membrane usher proteins for example ,LpfC [Salmonella 
typhimurium] gi i 1170817 | sp i P43682 | LPFC#SALTY (69% 
identity in 35 7 amino acids), interrupted TAG stop codon 

7545 SEQ ID NO: 1245 : 0.251765, 86, a putative fimbria! 

chaperone protein, similar to chaperones for example , LpfB 
[Salmonella typhimurium] 
gi|1170816|sp|P43661jLPFB#SALTY (67% identity in 229 
amino acids) 

7550 SEQ ID NO: 1246: -0.375904, 84, a putative fimbrial major 
protein precursor, similar to long polar fimbria proteinA 
precursor, LpfA, of S. typhimurium, 

gi | 1170815 | sp | P43660 | LP FA# SALTY (73% identity in 178 
amino acids) 

7555 SEQ ID NO: 1247: 0.721244, 194, a putative transcription 
regulatory element, similar to(at low level) hypothetical 
transcription regulator yisR [Bacillus subtilis] 

gi I 3123306 | sp j P40331 (24% identity in 276 amino acids) 
SEQ ID NO: 1248 : -0.13819, 454, a putative permease, 

7560 similar to hypothetical protein [Salmonella typhimurium] 
gi | 7442781 j pir j | C65167 (37% identity in 444 amino acids); 
transporter proteins (putative symporters) for example ,YicJ 
[Escherichia coli (K-12)] gi | 2851421 | sp ! P31435 ! YICJ#ECOLI 
(32% identity in 340 amino acids) 

7565 SEQ ID NO: 1249 : -0.388034, 118, novel, similar to 
hypothetical protein [Thermotoga maritima] 

gi ! 7452109 ! pir j | F72395 (37% identity in 635 amino acids) 
SEQ ID NO: 1250 : -0.070968, 559, novel, similar to 
hypothetical protein [Neisseria meningitidis MC58] 

7570 gi | 7227012 j gb j aaF42100.1 (39% identity in 398 amino acids) 
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SEQ ID NO: 1251: -0.387143, 141, novel, TTG start 
SEQ ID NO: 1252: -0.435323, 202, novel, TTG start 
SEQ ID NO: 1253: 0.383311, 750, novel, similar to surface 
proteins, for example ,.[Xylella fastidiosa] 

7575 gi | 9106565 | gb j aaE84338.1 | AE003982#11 (24% identity in 1514 
amino acids) 

SEQ ID NO: 1254 : -0.125258, 195, identical to lipid 
A-core:surface polymer ligase (WaaL), WaaL [Escherichia coli 
strain F653] gi I 3821825 [ gb | aaC69661.1 | (100% identity in 402 
7580 amino acids) 

SEQ ID NO: 1255: -0.00874, 390, similar to lipopolysaccharide 
1,2-N acetylglucosaminetransferase (WaaD), WaaD [Escherichia 
coli strain F653] gi | 382 1 82 6 I gb ! aaC 69 662.1 i (99% identity in 
380 amino acids) 

7585 SEQ ID NO: 1256 : 0.065584, 155, a putative 

UDP-glucose:(galactosyl) LPS alphal, similar to 
2-glucosyltransfe.rase (Waa J), UDP- glucose: (galactosyl) LPS 
a lphal ,2-gl u c o s y 1 1 rans fe r a s e W a a J i E s c h e r i c h i a coli s t r a i n 
F653] gi | 3821827 i gb i aaC69663.1 ! (98% identity in 184 amino 

7590 acids), TTG start 

SEQ ID NO: 1257: 0.147325, 244, a lipopolysaccharide core 
biosynthesis, identical to WaaY [Escherichia coli strain F653] 
gi | 3821828 | gb | aaC69664.1 j (100% identity in 235 amino acids) 
SEQ ID NO: - : -0.156479, 410, 

7595 UDP-D-galactose:(gluco8yl)Iipopolysaccharide- 

alpha- 1,3-D-galactosyltransferase. similar to Waal (strain F653 
R3 core type) I Escherichia coli] gi | 3821829 | gb | aaC69665.1 
(99% identity in 335 amino acids) 
SEQ ID NO: 1427: -0.248606, 252, novel 

7600 SEQ ID NO: 1428 : 0.024841, 158, a putative integrase, 
identical to CP4-like integrase [Escherichia coli EDL933] 
gi ! 3414871 | gb | aaC31482.1 | 5 similar to integrases for 
example , [Shigella flexneri] 

gi | 5532446 I gb j aaD44730.1 |AF141323#1 (95% identity in 390 
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7605 amino acids) 

SEQ ID NO: 1429 : 0.37957, 94, novel, identical to L0004 
[Escherichia coli strain EDL933] gi I 3414872 | gb | aaC31483.1 I ; 
similar to hypothetical proteins for example .[Escherichia 
coli plasmid p 0-157 insertion sequence LS911 

7610 gi j 7465897 | pir | | T00224 (56% identity in 116 amino acids), 
GTG start 

SEQ ID NO: 1430: 0.897123, 453, novel, identical to L0005 
[Escherichia coli strain EDL933] gi | 3414873 | gb | aaC31484.1 | , 
GTG start 

7615 SEQ ID NO: 1431 : -0.065339, 503, novel, identical to L0006 
[Escherichia coli strain EDL933j gi | 3414874 | gb | aaC31485.1 | ; 
similar to hypothetical proteins for example , [Vibrio 
cholerae] gi I 7960027 | gb | aaF71187.1 | AF179596#7 (60% 
identity in 300 amino acids) 

7620 SEQ ID NO: 1432: -0.496629, 90, novel, similar to Oterminal 
part of hypothetical proteins for example ,b2004 (YeeU) 
[Escherichia coli] gi j 3025157 j sp | P76364 | YEEU#ECOLI(84% 
identity in 53 amino acids) 

SEQ ID NO: 1433: -0.054196, 287, novel, identical to L000 7 
7625 [Escherichia coli EDL933]gi | 3414875 | gb | aaC31486.1 | I similar 
to hypothetical proteins for example ,b2005(yeeV) 

[Escherichia colij gi | 3025158 | sp I P76365 | YEEV#ECOLI (88% 
identity in 124 amino acids) 

SEQ ID NO: 1434: -0.327731, 120, novel, identical to L0008 
7630 [Escherichia coli EDL933] gi | 3414876 j gb j aaC3 1487.1 \ ', similar 
to hypothetical protein I Escherichiacoli D1114, O25:K10:H16j 
gi I 4887094 | gb | aaD32187.1 | (90% identity in 114 amino acids); 
similar to b2006 (YeeW) [Escherichia coli] 

gi ! 3025160 | sp | P76366 | YEEW#ECOLI (70% identity in 55 
7635 amino acids) 

SEQ ID NO: 1435: -0.472528, 92, novel, identical to L0009 
[Escherichia coli strain EDL933] gi I 3414877 I gb | aaC31488.1 | ; 
similar to hypothetical protein [Escherichia coli D1114, 
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025:K1Q:H16] gi | 4887094 | gb | aaD32187.1 j (84% identity in 59 
7640 amino acids); hypothetical. protein [Salmonella typhi] 
gi j 7800330 | gb | aaF69926.1 | AF25()878#87 (46% identity in 49 
amino acids) 

SEQ ID NO: 1339: -0.276608, 343, novel, identical to LOO 10 
[Escherichia coli strain EDL933] gi | 3414878 | gb | aaC31489.1 | ; 
7645 similar to PH01 [Escherichia coli D1114, O25:K10:H16] 
gi j 4887092 j gb j aaD32185.1 !AF127177#3 (62% identity in 78 
amino acids) 

SEQ ID NO: 1340: -0.474091, 661, novel, similar to(at low 
level) a part of hypothetical protein ydiA[ plasmid ColIb-P9] 

7650 gi | 4512489 i dbj | Baa75138.1 I (42% identity in 35 amino acids) 

SEQ ID NO: 1341: -0.667647, 69, novel, identical to LOO 12 
[Escherichia coli EDL933] gi | 3414880 | gb | aaC31491.1 | ; similar 
to a part of putative ATP-binding proteinugR 

[Salmonellatyphimurium] gi | 4324607 | gb | aaD 1695 1.1 | (45% 

765 5 identity in 66 amino acids) 

SEQ ID NO: 1342: 0.113158, 305, novel, identical to LOO 13 
[Escherichia coli EDL933] gi ] 3414881 | gb | aaC31492.1 | ; similar 
to hypothetical proteins for example ,Hp3 [Escherichia coli 
CFT073] gi ! 3661 484 j gb j aaC61715.1 | (100% identity in 74 

7660 amino acids) 

SEQ ID NO: 1343: -0.308539, 446, novel, identical to L0014 
[Escherichia coli] gi | 3414882 | gb | aaC31493. 1 j ; similar to 
hypothetical proteins for example ,orf50 [Escherichia coli 
plasmid P B171] gi | 6009426 I dbj | Baa84885.1 | (76% identity in 

7665 107 amino acids) 

SEQ ID NO: 1344: -0.137195, 165, novel, similar to LOO 15 
[Escherichia coli EDL933]gi j 3414883 ! gb 1 aaC3 1494.1 1 (99% 
identity in 512 amino acids); hypothetical proteins for 
example , [Escherichia coli plasmid pEAF] 

7670 gi j 4808945 | gb | aaD30027.1 | AF119170#2 (91% identity in 447 
amino acids) 

SEQ ID NO: 1345: 0.057488, 208, novel, similar to a part of 
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18630 insertion element hypothetical protein 
gi | 1143207 | gb j aaA84873.1 I (72% identity in 25 amino acids) 

7675 SEQ ID NO: 1346: 0.933648, 319, novel, similar to a part of 
hypothetical proteins for example .[insertion sequence I S 9 1 ] 
gi j 7466597 | pir | | T00311 (75% identity in 49 amino acids) 
SEQ ID NO: 1347: -0.269531, 257, a secreted effector protein, 
identical to L0016 [Escherichia coli EDL933] 

7680 gi j 3414884 | gb | aaC31495.1 | I similar to EspF [Escherichia coli 
E2348/69] gi j 2865308 j gb j aaC38400. 1 j (87% identity in 205 
amino acids) 

SEQ ID NO: 1461: -0.092614, 177, novel, identical to L0017 
[Escherichia coli EDL933]gi | 3414885 | gb | aaC31496.1 | I similar 

7685 to hypothetical proteins for example ,[ Escherichia coli] 
gi | 2809428 i gb i aaC28566.1 j (97% identity in 92 amino acids) 
SEQ ID NO: 1462: -0.045584, 352, novel, identical to EscF 
[Escherichia colij gi j 2865306 j gb j aaC38398. 1 | ; L0018 
[Escherichia coli EDL933] gi | 3414886 | gb | aaC31497.1 | 

7690 SEQ ID NO: 1463: -0.460825, 486, novel, identical to L0019 
[Escherichia coli EDL933]gi | 3414887 | gb | aaC31498.1 | ; similar 
to hypothetical proteins for example , Orf2 7 [Escherichia coli 
E2348/69] gi | 2 865305 | gb j a a C 38397.1 | (99% identity in 135 
amino acids) 

7695 SEQ ID NO: 1464: -0.264578, 368, an EspB protein (secreted 
protein), similar to EspB proteins for example ,EspB(L0020) 
[Escherichia coli EDL933] gi I 1657263 | emb | Caa65654.1 | (99% 
identity in 312 amino acids) 

SEQ ID NO: 1465: -0.234061, 230, an EspD secreted protein, 
7700 identical to L0021 [Escherichia coli EDL933] 
gi j 3414889 | gb | aaC31500.1 | ; similar to EspD proteins for 
example ,gi I 36882 79 | emb j Caa 76909.1 | (85% identity in 374 
amino acids) 

SEQ ID NO: 1466: 0.12809, 179, an EspA secreted protein, 
7705 identical to EspA protein (L0022) [Escherichia coli] 
gi I 3115184 | emb | Caa73506.1 | ; similar to EspA proteins for 
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example ,gi | 2388623 ! gb | aaB7 1083. 1 | (85% identity in 192 
amino acids) 

SEQ ID NO: - : -0.31476, 272, a type III secretion system SepL 
7710 protein, identical to SepL (L0023) [Escherichia coli EDL933] 
gi | 3115183 | emb | Caa 73505.1 | ; similar to SepL proteins for 
example ,gi | 2865301 | gb | aaC38393.1 | (94% identity in 351 
amino acids) 

SEQ ID NO: 1507: 0.694205, 467, a type III secretion system 
7715 EscD protein, identical to Pas (L0024) [Escherichia coli 
EDL933] gi | 3115182 | emb | Caa73504.1 | ; similar to EscD 
proteins for example ,gi | 3341420 | emb | Caa74170.1 | (97% 
identity in 6 amino acids) 

SEQ ID NO: - : -0,414177, 657, a Gamma intimin, identical to 
7720 Gamma intimin (L0 025) [Escherichia coli strain EDL933] 
gi | 3414893 ! gb | aaC31504.1 | 

SEQ ID NO: - : -0.310441, 432, a chaperon of Tir, identical to 
CesT [Escherichia coli 0-157:H7 strain HAl] 
gi | 975876 | gb j aaBOOHO.l | ; similar to CesT protein 
7725 [Escherichia coli] gi | 140611 | sp | P21244 | YEAE#ECOLI (96% 
identity in 156 amino acids) 

SEQ ID NO: - : -0,190991, 112, a translocated intimin receptor 
Tir, identical to translocated intimin receptor Tir (L00 2 7) 
[Escherichia coli 0"157:H7 strain EDL933] 

7730 gi | 3414895 ! gb j aaC3 1 506. 1 I 
[0025] 

8) Proteins relating to synthesis of lipopoiysaccharide 

>'.■■■] i< ii cli hyAl ^ t n ) I n i i o{ o u i id 

Character ~uch a.- n< s :<;:•) 

7735 SEQ ID NO: 1333: -0.537097,249, novel 

SEQ ID NO: 1334 : -0.248718, 352, novel, similar to 
hypothetical protein b2760[Escherichia coli strain K-12] 
gi | 7451979 j pir j | D65057 (24% identity in 303 amino acids) 
SEQ ID NO: 1335 : -0.612921, 179, novel, similar to 

7740 hypothetical protein YgcB[Escherichia coli strain K-12] 
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gi j 2506493 | sp | P38036 | YGCB#ECOLI (28% identity in 778 
amino acids), GTG start 

SEQ ID NO: 1336: -0.429615, 521, similar to YBDY#ECOLI 
gi j 3025009 I sp | P77091 (78% identity in 50 amino acids); 
7745 similar to SrnB [ plasmid F] dad | AP00191 8-5 | Baa97875.1 
(42% identity in 49 amino acids) 

SEQ ID NO: 1337 : -0.257627, 886, novel, similar to 
hypothetical proteins for example ,Tp70 [Treponema 
pallidum] gi | 752 1576 j pir j | A71309 (35% identity in 124 amino 
775 0 acids) 

SEQ ID NO: - : 0.81, 51, novel, similar to N-terminal part of 
hypothetical proteins for example ,YgcG [Escherichia colij 
gi | 1723817 | sp | P55140 | YGCG#ECOLI(43% identity in 186 
amino acids) 

7755 SEQ ID NO: 1512 : -0.608397, 132, novel 

SEQ ID NO: 1513: 0.301786, 225, novel, its N-terminal part 
is similar to N-terminal part of hypothetical proteins for 
example ,YgcG [Escherichia coli] 

gi | 1723817 | sp | P55140 | YGCG#ECOLI(31% identity in 147 

7760 amino acids) 

SEQ ID NO: 1514 : 0.238, 51, similar to YGCG#ECOLI 
gi | 1789140 (40% identity in 275 amino acids); similar to 
hypothetical protein [Pseudomonas aeruginosa] 

dad | AE004490-5 | aaG03925.1 (43% identity in 273 amino acids), 

77 65 GTG start 

SEQ ID NO: 1515 : 0.225393, 383, a lipoprotein precursor (type 
III secretion system), similar to type III secretion system 
lipoprotein precursors for example ,PrgK protein [Salmonella 
typhimurium] gi | 1172615 j sp j P41786 \ PR GK# SALTY (53% 

7770 identity in 231 amino acids) 

SEQ ID NO: - : 0.151648, 274, a type III secretion protein, 
similar to Mxil [Shigella flexneri] 

gi j 547954 | sp | Q06080 | MXII#SHIFL (32% identity in 93 amino 
acidsKPrgJ protein [Salmonella typhimurium] 
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7775 gi j 1172614 | sp | P41785 | PRGJ#SALT Y ( 31% identity in 87 
amino acids) 

SEQ ID NO: 1192: 0.037705, 245, a type III secretion protein, 
similar to putative typelll secretion proteins for 
example ,PrgI protein [Salmonella 

7780 typhimurium] gi | 11 72613 | sp j P41784 ! PRGI# SALTY (64% 
identity in 76 amino acids) 

SEQ ID NO: 1193: -0.282727, 111, a putative adherence factor, 
similar to a part of adherence factors for example , Efal 
[Escherichia coli Olll :H- strain E 45 03 5] 

7785 gi | 6013469 | gb | aaD49229.2 | AF159462#l(amino acids at the 
position 433-711/3223) (100% identity in 279 amino acids), 
probably disrupted by frameshift 

SEQ ID NO: 1194: -0.588608, 80, a transposase, identical to 
transposase [Escherichia coli plasmid p 0-157 IS629] 

7790 gi | 7443862 j pir j | T00240 

SEQ ID NO: 1195: -0.379918, 245, a transposase, identical to 
hypothetical protein [Escherichia coli plasmid p 0-157 
IS629] gi | 7444868 [ pir M T0024K similar to hypothetical 
protein, insertion sequences for example , [Shigella flexnerij 

7795 gi [ 553 2 4 5 4 j gb j a a D 4 4 738.1 j AF 1 4 1 3 2 3# 9 (96% identity in 108 
amino acids) 

SEQ ID NO: 1196: -0.045181, 167, novel, GTG start 
SEQ ID NO: 1197 : -0.081233, 374, novel, similar to 
hypothetical proteins for example , LOO 14 I Escherichia coli 
7800 0-157:H7 strain EDL933] gi | 3414882 | gb | aaC31493.1 | (99% 
identity in 115 amino acids) 

SEQ ID NO: 1198 : 1.038462, 79, novel, similar to 
hypothetical proteins for example , LOO 15 [Escherichia coli 
0-157:H7 strain EDL933] gi | 3414883 | gb | aaC31494.1 j (100% 
7805 identity in 411 amino acids) 

SEQ ID NO: 1199: 0.805162, 151, novel, similar to a part of 
hypothetical proteins for example ,L0013 [Escherichia coli 
0-157:H7 strain EDL933I gi | 3414881 | gb | aaC31492.1 | (55% 
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identity in 28 amino acids), GTG start, probably disrupted 
7810 SEQ ID NO: 1200 : 0.976744, 87, novel, similar to 
hypothetical proteins for example ,ORF50 [Escherichia coli 
plasmid P B171] gi | 6009426 I dbj | Baa84885.1 | (70% identity in 
106 amino acids) 

SEQ ID NO: 1201 : 0.748416, 222, novel, similar to 
7815 hypothetical proteins for example , LOO 15 [Escherichia coli 
0-157:117 strain EDL933! gi j 3414883 I gb | aaC31494.1 | (63% 
identity in 464 amino acids) 

SEQ ID NO: 1202: -0.236585, 329, novel, similar to a part of 
transposases for example ,TnpA [Shigella flexneri] 

7820 gi [ 5532449 I gb | aaD44733. 1 I AF141 323#4 (93% identity in 49 
amino acids) 

SEQ ID NO: 1203 : -1.506341, 206, novel, similar to 
hypothetical proteins for example .L0004 [Escherichia coli 
0-157:H7 strain EDL933] gi I 3414872 | gb | aaC31483.1 I (98% 

7825 identity in 91 amino acids); putative transposase [Vibrio 
cholerae] gi | 7960026 | gb [ aaF71186.1 | AF179596#6 (59% 
identity in 91 amino acids); hypothetical protein [Escherichia 
coli plasmid p 0-157 insertion sequence IS911.I 

gi | 7465897 I pir I | T00224 (52% identity in 91 amino acids) 

7830 SEQ ID NO- 1204: -0.892208, 78, a putative transcription 
regulatory element, similar to regulatory elements (RpiR 
family) for example , [Bacillus subtilis] 

gi | 8248807 | emb | CAB93068.1 | (25% identity in 236 amino 
acids) 

7835 SEQ ID NO: 1205 : -1.002703, 112, a putative 
ferric hrome- binding protein, similar to ferrichrome -binding 
proteins for example , [Bacillus subtilis] 

gi j 585132 | sp | P37580 | FHUD#BACSU (27% identity in 220 
amino acids) 

7840 SEQ ID NO: 1206: -0.212558, 440, a putative ferrichrome 
ABC transporter (permease), similar to ferrichrome ABC 
transporters (permease) for example .[Bacillus subtilis] 
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gi | 1706797 | sp | P49937 | FHUG#BACSU (33% identity in 319 
amino acids) 

7845 SEQ ID NO: 1207: 0.465452, 687, a putative ferrichrome ABC 
transporter (permease), similar to ferrichrome ABC 
transporters (permease) for example , [ Synechocystis sp.] 
gi j 7442493 | pir | | S74438 (43% identity in 315 amino acids); 
[Bacillus subtilisj gi I 1706795 | sp | P49936 | FHUB#BACSU (39% 

7850 identity in 319 amino acids) 

SEQ ID NO: 1208 : -0.209449, 382, a putative ABC-type 
iron-siderophore transport system ATP -bin ding protein, similar 
to ABC-type iron-siderophore transport system ATP-binding 
proteins for example .[Synechocystis sp.] 

7855 gi | 7442509 | pir | | S74440 (52% identity in 248 amino acids) 

SEQ ID NO: 1209: -0.149383, 568, a putative ferrichrome-iron 
receptor precursor, similar to ferrichrome-ironreceptor 
precursors for example ,gi | 7448497 | pir | | S74457 (30% 
identity in 688 amino acids) 

7860 SEQ ID NO: 1210: 0.036546, 250, novel, TTG start 

SEQ ID NO: 1211 : 1.166101, 60, a PTSdependent 
N-acetyl-galactosamine-IID component (AgaE), similar to 
PTSdependent N- acetyl- galactosamine -HI) component, AgaE 
[Escherichia coli strain C] 

7865 gi | 8895749 | gb | aaF81085.1 | AF228498#5 (96% identity in 292 
amino acids) 

SEQ ID NO: - : -0.257895, 77, a PTS dependent 
N- acetyl- galactosamine- and galactosamine IIA component 
(Afa F) . similar to ts dependent N-acetyl-galactosamine-and 
7870 galactosamine IIA component, AgaF [Escherichia coli strain C] 
gi j 8895750 | gb | aaF81086.1 | AF228498#6 (99% identity in 144 
amino acids) 

SEQ ID NO: 1527 : 0.06993, 144, a transposase (insertion 
sequence IS629), identical to hypothetical protein 
7875 gi ! 7444868 j pir j | T00241 

SEQ ID NO: 1528 : 1.167709, 193, identical to transposase 
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(insertion sequence IS629),gi I 7443862 | pir ! | T00240 
SEQ ID NO: 1529: 0.38766, 236, novel 

SEQ ID NO: 1530: -0.008, 226, a leader peptidase, similar to 
7880 leader peptidases for example ,HopD (strain ECOR30) 
[Escherichia coli] gi | 7674073 | sp | 068932 (92% identity in 155 
amino acids); (LT2) [Salmonella typhimurium] 

gi I 7674072 | sp | 068927 (68% identity in 148 amino acids) 
SEQ ID NO: 1531 : "0.168, 226, novel, similar to hypothetical 
7885 protein [Xylellafa stidiosa] 

gi | 9112262 | gb | aaF85593.1 | AE003851#24 (50% identity in 86 
amino acids) 

SEQ ID NO: - : -0.265401, 238, a putative invasin, similar to 
putative membrane protein bl978 [Escherichia coli K-12] 
7890 gi | 1736642 i dbj ! Baal 5799.1 | (45% identity in 1391 amino 
acids); vasin [Yersinia pseudotuberculosis] 

gi I 79202 | pir | | A29646 (35% identity in 1211 amino acids) 
[0026] 

9) Proteins relating to metabolism 
7895 Sequence number : hydrophobicity, The number of amino 
acids, Character such as function 

SEQ ID NO: 826: -0.36383, 48, novel, similar to hypothetical 
protein [Bacteriophage 933W] gi | 4499789 | emb | CAB39288. 1 ! 
(97% identity in 71 amino acids) 
7900 SEQ ID NO: 827: -0.877049, 62, a putative fimbrial chaperone, 
similar to fimbria!, chaperoa.es for example , [Salmonella 
typhimurium] gi i 1170816 | sp ! P43661 ! LPFB#SALTY (40% 
identity in 104 amino acids) 

SEQ ID NO: 828: -0.388722, 134, a putative type 1 fimbrial 
7905 protein, similar to type 1 fimbrial proteins for 
example , [Salmonella enteritidis] gi I 913907 j gb j aaB33536. 1 I 
(31% identity in 198 amino acids) 

SEQ ID NO: 829: 0.010435, 116, novel, similar to conserved 
hypothetical proteins for example .HP0709 [Helicobacter 
7910 pylori 26695] gi j 7463979 j pir j j E64608 (88% identity in 300 
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amino acids) 

SEQ ID NO: 830 : -0.455859, 513, novel, similar to 
hypothetical protein [Xylella f astidiosa] 

gi I 9104946 I gb | aaF82968.1 I AE003869#5 (33% identity in 270 
7915 amino acids) 

SEQ ID NO: 831 : -0.335065, 78, novel (hypothetical 
membrane protein) 

SEQ ID NO: 832 : -1.205882, 52, novel, similar to (at low 
level) membrane protein [Staphylococcus aureus] 

7920 gi j 3676428 | gb | aaC61946.1 (26% identity in 236 amino acids) 
SEQ ID NO: 833: -0.434677, 249, novel 
SEQ ID NO: 834: 0.071739,93, novel, GTG start 
SEQ ID NO: 835 : -0.190411,74, novel, GTG start 
SEQ ID NO: 836 : -0.322222, 136, a raffinose metabolism 

7925 (putativ for example ,lyco protein), similar to RafY [Escherichia 
coli plasmid pRSD2j gi | 1773072 | gb | aaB7 1432.1 (78% 
identity in 464 amino acids) 
SEQ ID NO: 837: -0.195833, 313, novel 

SEQ ID NO: 838 : -0.038235, 375, novel (hypothetical 

793 0 membrane protein) 

SEQ ID NO: 839: -0.158854, 193, a Rhs protein, similar to Rhs 
proteins for example ,RhsF[Eseherichia coli] 

gi | 2920637 j gb j aaC32473.1 j (97% identity in 1394 amino acids), 
[RhsII core protein with extension] 

7935 SEQ ID NO: 840: -0.174074,352, novel 

SEQ ID NO: 841 : -0.092611, 407, a putative amino acid 
amidohydrolase, similar to amino acid amidohydrolases for 
example .benzoylglyeine amidohydrolase ( H ippuriease) 
[Campylobacter jejuni] gi | 1170277 | spP45493 | HIPO#CAMJE 

7940 (46% identity in 383 amino acids) 

SEQ ID NO: 842 : -0.384796, 935, a putative 

membrane transporter protein, similar to 

membranetransporter proteins for example .citrate-proton 
symporter [Klebsiella pneumoniae] 
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7945 gi j 116482 | sp j P16482 | CIT1#KLEPN (30% identity in 429 
amino acids) 

SEQ ID NO: 843 : -0.174359, 157, novel, similar to 
hypothetical protein b3122 [Escherichia coli (strain K- 12)] 
gi I 7466507 | pir j I G65101 (62% identity in 35 amino acids) 
7950 SEQ ID NO: 844 : -0.563799, 559, a putative L-sorbose 
1-phosphate dehydrogenase, similar to L-sorbose 1-phosphate 
dehydrogenases, for example , [Klebsiella pneumoniae] 

gi | 586014 | sp | P37084 | SORE#KLEPN (85% identity in 407 
amino acids) 

7955 SEQ ID NO: 845: -0.552709, 204, a putative sorbose-permease 
IID component (PTS system), similar to many sorbose-permease 
I ID components for 

example ,gi | 548634 | sp | P37083 | PTRD#KLEPN (95% identity in 
215 amino acids), probably disrupted (N-terminal part (amino 

7960 acids at the position 1-60) is deleted) 

SEQ ID NO: 846 : -0.058268, 128, a putative regulatory 
element (repressor), its N -terminal-half part is similar 
tohypothetical protein HI1476 [Haemophilus 

influenzae! gi I 1175815 j sp | P44207 | YE76#HAEIN (35% identity 

7965 in 70 amino acids); its C -terminal-half part is similar to 
p u t a t i v e r e p r e s s o r p r o t e i n [ B a c t e r i o p h a g e I) 108] 

gi I 133345 | sp | P07040 | RPC1#BPD 10(26% identity in 79 amino 
acids) 

SEQ ID NO: 847: -0.457738, 169, a putative DNA-binding 
7970 protein, similar to Ner-likeDNA-binding proteins for 
example ,gi | 6900348 | emb | CAB71960.1 | (44% identity in 70 
amino acids) 

SEQ ID NO: 848 : -0.023279, 306, a putative phage 
transposase, similar to transposases for example .[Neisseria 
7975 meningitidis] gi | 7379960 | emb | CAB84536. 1 I (40% identity in 
639 amino acids) 

SEQ ID NO: 849 : -0.484058, 139, a transposition protein, 
similar to DNA transposition proteinB [Bacteriophage Muj 
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gi j 139318 | sp | P03763 | VPB#BPMU (48% identity in 298 amino 
7980 acids) 

SEQ ID NO: 850: -0.9296, 126, novel, similar to(at low level) 
phosphoserine phosphatase [Neisseria meningitidis MC58I 
gi I 7226221 | gb | aaF41385.1 I (38% identity in 49 amino acids) 
SEQ ID NO: 851 : 0.013677, 447, novel 
7985 SEQ ID NO: 852: 0.371556, 676, novel, GTG start 
SEQ ID NO: 853: 0.247863, 118, novel, GTG start 
SEQ ID NO: 854: 0.4454 54, 100, novel 

SEQ ID NO: 855 : -0.008451, 143, putative host-nuclease 
inhibitor, similar to host-nuclease inhibitor protein (Gam) for 
7990 example , I Bacteriophage Muj 

gi I 138127 j sp | P06023 | VGAM#BPMU (56% identity in 174 
amino acids) 

SEQ ID NO: 856: -0.096842, 191, novel 

SEQ ID NO: 857 : -0.295364, 152, novel, similar to Gpll 
7995 [Bacteriophage Mu] gi | 6010385 | gb | aaF01088.1 | AF083977#7 
(67% identity in 177 amino acids) 

SEQ ID NO: 858: -0.149414, 513, novel, similar to gpl2 
[Bacteriophage Muj gi | 215568 j gb j aaA32400.1 I (52% identity 
in 168 amino acids) 
8000 SEQ ID NO: 859 : -0.454967, 152, novel, similar to gp9 
[Bacteriophage Mu] gi | 6010430 | gb | aaF01133.1 | AF083977#54 
(30% identity in 82 amino acids) 
SEQ ID NO: 860: -0.538686, 138, novel 

SEQ ID NO: 861: -0.001626, 124, novel, similar to (at low 
8005 level) zinc finger proteins for example , [Rattus norvegicusj 
gi | 141712 | sp | P1.8745 | Z022#XENLA (33% identity in 48 amino 
acids) 

SEQ ID NO: 862: -0.207895, 153, novel 

SEQ ID NO: 863 : 0.275652, 346, novel, similar to 
8010 hypothetical proteins for example ,gpl6 [Bacteriophage Muj 
gi j 6010390 | gb | aaF01093.1 | AF083977#12 (43% identity in 162 
amino acids) 
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SEQ ID NO: 864: 1.013566, 259, putative positive regulator 
of late transcription, similar to transcription regulatory 
8015 elements for example , positive regulator of late transcription 
( protein C) [Bacteriophage Mu] 

gi | 139320 | sp | P06022 | VPC#BPMU (39% identity in 128 amino 
acids) 

SEQ ID NO: 865: 1.206742, 90, an endolysin (host cell lysis), 
8020 similar to endolysins for example ,Lys [Bacteriophage Mu! 

| 126600 | sp | P27359 | LYCV#BPP21 (37% identity in 156 amino 
acids) 

SEQ ID NO: 866 : 0.813365, 218, novel, similar to P14 
[Bacteriophage APSE- 1 1 

8025 gi | 6118009 | gb j aaF03957.1 I AF157835#14 (27% identity in 82 
amino acids), GTG start 

SEQ ID NO: 867 : -0.361905, 232, novel, similar to P16 
[Bacteriophage APSE-l] 
gi | 6118011 | gb | aaF03959.1 | AF157835#16 (46% identity in 81 
8030 amino acids) 

SEQ ID NO: 868: -0.288945, 200, novel, similar to traR family, 
for example , Orf82 [Bacteriophage P2] 

gi | 732223 | sp | Q06424 | Y082#BPP2 (52% identity in 34 amino 
acids) 

8035 SEQ ID NO: 889 : -0.829907, 108, novel, similar to gp25 
[Bacteriophage Mu] gi | 6010400 | gb | aaF01103.1 | AF083977#22 
(35% identity in 91. amino acids) 

SEQ ID NO: 870: -0.475, 73, novel, similar to hypothetical 
proteins for example , gp2 6 [Bacteriophage Mu] 

8040 gi | 6010401 | gb | aaF01104.1 | AF083977#23 (82% identity in 95 
amino acids) 

SEQ ID NO: 871 : -0.715504, 130, novel, similar to 
hypothetical proteins for example ,gp27 [Bacteriophage Mu] 
gi j 6010402 | gb | aaF01105.1 | AF083977#24 (56% identity in 193 
8045 amino acids) 

SEQ ID NO: 872 : 0.351219, 42, a putative portal protein, 
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similar to hypothetical proteins for example ,gp28 (possible 
portal protein H) [Bacteriophage Mu] 

gi j 6010403 | gb j aaF01106.1 |AF083977#25 (73% identity in 537 
805 0 amino acids) 

SEQ ID NO: 873 : -0.262814, 399, novel, similar to 
hypothetical proteins for example ,gp29 [Bacteriophage Mu] 
gi I 6010404 | gb | aaFO 1107,1 ! AF083977#2 6 (57% identity in 529 
amino acids) 

8055 SEQ ID NO: 874 : -0.127574, 273, novel, similar to 
hypothetical proteins for example ,gp30 [Bacteriophage Mu] 
gi | 6010405 | gb | aaFOHOS.l | AF083977#27 (60% identity in 437 
amino acids) 

SEQ ID NO: 875 : -0.857143, 78, a virion morphogenesis, 
8060 similar to G protein [Bacteriophage Mu] 

gi | 267389 | sp | Q01261 I VPG#BPMU (53% identity in 151 amino 
acids) 

SEQ ID NO: - : -0.821875, 65, a potential protease protein, 
similar to gpl [Bacteriophage Mu] gi | 7226336 | gb | aaF41489.1 | 

8065 (31% identity in 369 amino acids), 

SEQ ID NO: 1686 : -0.40171, 118, a putative major head 
subunit, similar to proteinT [Bacteriophage Mu] 
g i | 6 0 1 0 4 09 | gb | aa F 0 1 1 1 2 . 1 j A F 0 8 3 9 7 7 # 3 1 (66 % i d e n t i t y i n 3 1 1 
amino acids); hypothetical proteins for example .[Neisseria 

8070 meningitidis] gi j 6900377 | emb | CAB71 989.1 j (50% identity in 
311 amino acids) 

SEQ ID NO: 1687: -0.015888, 108, novel, similar to gp35 
[Bacteriophage Mu] gi j 6010410 | gb j aaF01113.1 | AF083977#32 
(40% identity in 62 amino acids) 
8075 SEQ ID NO: 1533 : -0.455151, 331, novel, similar to 
hypothetical proteins for example ,gp36 [Bacteriophage Mu] 
gi ! 6010411 | gb | aaF01114.1 j AF083977#33 (46% identity in 139 
amino acids) 

SEQ ID NO: 1534 : -0.761539, 105, novel, similar to 
8080 hypothetical proteins for example ,gp37 [Bacteriophage Mu] 
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gi | 1175870 | sp | P44231 I YF09#HAEIN (33% identity in 187 
amino acids) 

SEQ ID NO: 1535 : -0.293125, 161, novel, similar to 
hypothetical proteins for example ,gp38 [Bacteriophage Mu] 
8085 gi | 6010413 | gb | aaFOllie.l | AF083977#35 (54% identity in 52 
amino acids) 

SEQ ID NO: 1536: -0.370046, 218, a major tail subunit (sheath 
protein), similar to sheath protein GpL [Bacteriophage Mu] 
gi j 1834291 | dbj | Baa 19195.1 j (51% identity in 499 amino 
8090 acids); hypothetical proteins for example , [Haemophilus 
influenzae Ed] gi | 1175872 | sp | P44233 | YF11#HAEIN (40% 
identity in 499 amino acids) 

SEQ ID NO: 1564 : -0.396053, 77, novel, similar to 
hypothetical proteins for example ,GpM [Bacteriophage Mu] 

8095 gi | 1834292 | dbj | Baal9196.1 ! (49% identity in 120 amino acids) 
SEQ ID NO: 1565 : -0.199849, 663, novel, similar to 
hypothetical proteins for example ,QRF3 [Bacteriophage Mu] 
gi | 1834293 | dbj | Baal 9 197.1 | (49% identity in 122 amino 
acids) 

8100 [002 7] 

10) Proteins processing DNA/RNA 

Hci|!,c;i is n utobo) ir ! j 1 v. The number of amino 

:«c:id { i i j 3 l e i - i j < u n c t_ i on 

SEQ ID NO: 1395: -0.645885, 803, a type III secretion protein 
8105 (surface presentation of antigens), similar to N-terminal part of 
putative type III secretion proteins for example ,SpaR 
protein (surface presentation of antigens) [Salmonella 
typhimurium] gi | 730799 j sp ] P40701 | SPAR#SALTY(44% 

identity in 144 amino acids), probably interrupted 
8110 SEQ ID NO: 1396: -0.414798, 224, a type III secretion protein, 
similar to type Illsecretion proteins for example ,SpaQ 
[Salmonella enterica] gi | 975756 | gb j aaC43847. 1 j (68% identity 
in 86 amino acids) 

SEQ ID NO: 1397: -0.230128, 157, type III secretion protein, 
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8115 similar to type Illsecretion proteins for example ,SpaP 
[Salmonella enterica! gi | 975755 | gb | aaC43846.1 | (69% identity 
i n 2 1 8 a m i no acids) 

SEQ ID NO: 1398 : 0.60339, 60, type III secretion protein, 
similar to type III secretion proteins for example ,SpaO 
8120 [Salmonella enterica] gi j 973277 | gb j aaC43944. 1 j (32% identity 
in 292 amino acids) 

SEQ ID NO: 1399: -0.623677, 795, type III secretion protein, 
similar to Oterminal part of type Illsecretion proteins for 
example ,SpaN [Salmonella enterica] 

8125 gi | 1155289 | gb | aaC44993.1 | (38% identity in 82 amino acids), 
TTG start 

SEQ ID NO: 1400: -0.241304, 47, novel 

SEQ ID NO: - : -0.288136, 60, a type III secretion protein, 
similar to type III secretion proteins for example ,SpaM 
8130 [Salmonella enterica] gi | 1155297 | gb | aaC44998.1 | (29% 
identity in 1.46 amino acids) 

SEQ ID NO: 1412 : -0.074167, 361, a putative tape measure 
protein, similar to hypothetical proteins for example , Gp42 
(putative tape measure protein) [Bacteriophage Muj 

8135 gi | 6010417 | gb | aaF01120.1 | AF083977#39 (36% identity in 686 
amino acids) 

SEQ ID NO: 1413: -0.064607, 357, a putative DNA circulation 
protein, similar to DNA circulation proteins for example , 
protein N [Bacteriophage Mu] 

8140 gi j 6010418 | gb | aaF01121.1 | AF083977#40 (33% identity in 441 
amino acids) 

SEQ ID NO: 1414 : -0.374289, 845, a putative tail protein, 
similar to tail protein fors example , P protein 
[Bacteriophage Mu] gi j 139353 | sp j P08558 | VPP#BPMU (47% 
8145 identity in 360 amino acids), GTG start 

SEQ ID NO: 1415 : 0.2, 54, novel, similar to hypothetical 
proteins for example ,gp45 [Bacteriophage Mu] 

gi | 6010420 ! gb j aaF01123.1 |AF083977#42 (51% identity in 195 
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amino acids), may be involved in base plate assembly 
8150 SEQ ID NO: 1416 : -0.05748, 128, novel, similar to 
hypothetical proteins for example ,Gp46 [ Bacteriophage Mu] 
gi j 6010421 | gb | aaFOl 124.1 | AF083977#43 (53% identity in 144 
amino acids) 

SEQ ID NO: 1417: -0.1945, 201, novel, similar to hypothetical 
8155 proteins for example ,Gp47 [Bacteriophage Mui 
gi j 6010422 j gb j aaFQ1125.1 !AF083977#44 (53% identity in 380 
amino acids) 

SEQ ID NO: 1661: -0.169, 301, novel, similar to hypothetical 
proteins for example ,Gp48 [Bacteriophage Mu] 

8160 gi [ 6010423 I gb | aaFOl 126. 1 I AF083977#45 (48% identity in 183 
amino acids) 

SEQ ID NO: 1556 : -0.241844, 283, a putative tail fiber, 
similar to S protein [Bacteriophage Mu] 

gi | 6010424 j gb j aaF01127.1 !AF083977#46 (46% identity in 198 
8165 amino acids); hypothetical proteins for example ,Bcv 
[Shigella boydii] gi | 96900 I pir j | A42463 (56% identity in 78 
amino acids) 

SEQ ID NO: 1557 : 0.691919, 100, a putative tail fiber 
assembly protein, similar to unnamed protein product 
8170 [Bacteriophage 186] gi | 3522882 | gb | aaC34165.1 | (39% identity 
in 173 amino acids); tail fiber assembly proteins for 
example ,U protein[Bacteriophage Mu] 

gi j 6010425 | gb | aaF01128.1 | AF083977#47 (28% identity in 176 
amino acids) 

8175 SEQ ID NO: 1687: 1.052233, 292, similar to a Oterminal part 
of tail fiber protein (partial), Oterminal part of tail fiber 
proteins for example ,S [Bacteriophage Mu] 

gi j 6010424 | gb | aaF01127.1 | AF083977#46 (38% identity in 65 
amino acids) 

8180 SEQ ID NO: - : -0.43064, 298, a putative invertase, similar to 
site- specific recombinases for example , DNA-inver tas for 
example ,in [Bacteriophage Mu] 
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gi j 6010426 | gb | aaF01129.1 | AF083977#50 (75% identity in 181 
amino acids) 

8185 SEQ ID NO: 1600 -0.069079, 305, novel, similar to 
hypothetical proteins for example .L0105 [Bacteriophage 
933W] gi | 4585419 | gb | aaD25447.1 | AF125520#42 (73% identity 
in 614 amino acids) 

SEQ ID NO: - : -0.338889, 73, novel, similar to orf25 
8190 [Bacteriophage 933W] gi | 4499806 | emb | CAB39305. 1 j (52% 
identity in 57 amino acids) 

SEQ ID NO: 1616 : -0.524138, 465, novel, similar to 
hypothetical proteins for example .L0106 [Bacteriophage 
933 W] gi ! 4585420 | gb | aaD25448.1 | AF125520#43 (41% identity 
8195 in 79 amino acids) 

SEQ ID NO: 1630: -0.041597, 239, novel 
[0028] 

11) Proteins relating pathogenicity 

'-i [ U 1 T i J !i i i ! N 11 lit] o< tlllll) Hlih 

8200 Character such as function 

SEQ ID NO: 1631 : 0.342857, 225, a type III secretion protein 
(ATP synthetase), similar to putative type III secretion 
proteins (ATP synthetase) for example ,invC [Salmonella 
typhimurium] gi j 730791 | sp | P39444 | SPAL#S ALTY (63% 

8205 identity in 387 amino acids) 

SEQ ID NO: 1472: -0.763847, 1395, a type III secretion protein, 
similar to type III secretion proteins for example ,InvA 
[Salmonella typhimurium] gi | 4768 1 9 ! pir j j A42888 (64% 
identity in 686 amino acids) 

8210 SEQ ID NO: - : -0.352577, 98, a type III secretion protein, 
similar to type III secretion proteins for example , invasion 
protein [Salmonella enterica] gi | 1236845 | gb | aaC4504 1 . 1 j 
(37% identity in 355 amino acids) 

SEQ ID NO: 1552: -0.029639, 389, a type III secretion protein, 
8215 similar to type III secretion proteins for example , InvG 
[Salmonella typhimurium] 
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gi | 1170574 | sp | P35672 j INVG#SALTY (53% identity in 558 
amino acids) 

SEQ ID NO: - , 0.760046, 439, a transcriptional regulator of 
8220 type III secretion system, similar to transcriptional regulators 
for example ,invF [Salmonella typhi murium] 

gi j 729852 j sp | P39437 | INVF#SALTY (40% identity in 200 
amino acids) 

SEQ ID NO: 690: -0.029412, 52, novel, GTG start 
8225 SEQ ID NO: 691 : -0.113448,410, novel, GTG start 

SEQ ID NO: 692 : 0.817973, 218, novel, similar to 
hypothetical proteins for example , [Methanobacterium 
thermoautotrophicum] gi | 7482365 j pir j j D69031 (32% identity 
in 100 amino acids) 
8230 SEQ ID NO: 693 : -0.541477, 177, a putative transporter, 
similar to hypothetical protein [ plasmid pNZ4000j 
gi | 5123516 j gb j aaD40355.1 j (31% identity in 185 amino acids); 
similar to (at low level) putative low-affinity inorganic 
p h o s p h a t e t r a n s p o r t e r [ M y c o b a c t erium t u berculosis] 
8235 gi | 7387993 | sp | 006411 | PIT#MYCTU (26% identity in 212 
amino acids) 

SEQ ID NO: 694: -0.540244, 83, a hypothetical lipoprotein, 
similar to hypothetical proteins for example ,[ plasmid 
pNZ4000] gi | 5123517 | gb | aaD40356.1 | (25% identity in 209 

8240 amino acids) 

SEQ ID NO: 695: -0.645115, 697, a putative ABC transporter 
ATP-bindingsubunit, similar to ABC transporter ATP-binding 
subunits for example , cation ABC transporter (ATP-binding 
protein) homolog ykoD [Bacillus subtilis] 

8245 gi 1 7445788 | pir | | H 69858 (32% identity in 201 amino acids) 

SEQ ID NO: 696: -0.096774, 311, a putative ABC-transporter 
ATP-bindingsubunit, similar to ABC-transporter ATP-binding 
subunits for example ,PotA homolog [Agro bacterium 

rhizogenes plasmid pRil724] gi | 8918682 | dbj | Baa97747.1 j 

8250 (35% identity in 223 amino acids); [ plasmid pNZ4000] 
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gi | 5123519 ! gb | aaD40358.1 I (30% identity in 211 amino acids) 
SEQ ID NO: 897 : 0.076712, 74, novel, similar to 
YGGC#ECOLI gi | 1789296 (83% identity in 233 amino acids), 
but comprising different C-terminal part, 
8255 ; similar to kinaselike protein FrcK [Sinorhi zobium meliloti] 
dad | AF 196574- 5 | aaG28501.1 (38% identity in 190 amino acids), 
GTG start 

SEQ ID NO: 698 : -0,44881, 85, novel (hypothetical 
lipoprotein) 

8260 SEQ ID NO: 699 : -0.246237, 94, a integrase, similar to 
integrases for example , [prophage P4] 

gi I 6179516 I emb | CAB59974.1 | (55% identity in 414 amino 
acids) 

SEQ ID NO: 700: -0.042222, 91, novel, similar to Oterminal 
8265 part of hypothetical proteins for example ,L0015 
[Escherichia coli 0-157:117 strainEDL933] ] 

gi | 4808945 | gb | aal)30027.1 | AF119170#2(88% identity in 206 
amino acids), GTG start, probably disrupted 

SEQ ID NO: 701 : -0.378351, 98, novel, similar to a part of 
8270 hypothetical proteins for example ,L0013 [Escherichia coli 
0-157:H7 strain EDL933] gi | 3414881 | gb | aaC31492.1 | (100% 
identity in 44 amino acids), GTG start, probably disrupted 
SEQ ID NO: 702 : -0.572727, 177, novel, similar to 
hypothetical proteins for example ,ORF29 [Escherichia coli 
8275 plasmid pB17l] gi | 6009405 | dbj | Baa84864.1 | (39% identity in 
204 amino acids) 

SEQ ID NO: 703 : -0.159444, 181, novel, similar to 
hypothetical proteins for example ,ORF30 [Escherichia coli 
plasmid pB17l] gi | 8009406 | dbj | Baa84865.1 | (80% identity in 
82 80 115 amino acids) 

SEQ ID NO: 704 : 0,131638, 178, novel, similar to 
hypothetical proteins for example ,ORF31 [Escherichia coli 
plasmid pBl7l] gi | 6009427 I dbj | Baa84886.1 | (63% identity in 
468 amino acids) 
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8285 SEQ ID NO: 705 : -0,321053, 172, novel, similar to 
hypothetical protein [Salmonella ch.oleraesu.is] 

gi | 7467227 | pir | I T28668 (43% identity in 261 amino acids) 
SEQ ID NO: 706 : -0.725, 97, a putative virulence-related 
membrane protein, similar to virulence-related membrane 

8290 proteins for example ,pagC [Salmonella typhimurium] 
gi | 129558 | sp | P23988 | PAGC#SALTY (45% identity in 171 
amino acids) 

SEQ ID NO: 707: -0.170161, 125, novel 

SEQ ID NO: 708: -1.030769, 66, novel, similar to(at low level) 
8295 hypothetical proteins for example .FhaB [Neisseria 
meningitidis] gi | 6900333 | emb j CAB71945.1 I (37% identity in 
48 amino acids), GTG start 

SEQ ID NO: 709 : 0.1, 99, novel, identical to L0028 
[Escherichia coli 0-15?:H7 strain EDL933] 

8300 gi | 3414896 j gb j aaC31507.1 | ; similar to hypothetical proteins 
for example , [Escherichia coli] gi I 3249026 | gb | aaC69313.1 | 
(99% identity in 203 amino acids) 

SEQ ID NO: 710: -0.514201, 170, novel, identical to L0029 
[Escherichia coli 0-157:117 strain EDL933] 

8305 gi | 3414897 | gb | aaC31508.1 | ; similar to rOrflO [Escherichia 
colilgi | 2865295 | gb | aaC38388.1 | (78% identity in 119 amino 
acids) 

SEQ ID NO: 711 : -0.516312, 142, novel, identical to L0030 
[Escherichia coli 0"157:H7 strain EDL933] 

8310 gi j 3414898 | gb | aaC31509.1 | ; similar to Orfl8 [Escherichia 
colilgi j 2865294 | gb | aaC38387.1 | (74% identity in 159 amino 
acids) 

SEQ ID NO: 712: -0.221687, 167, a type III secretion system 
SepQ protein, identical to L0031 [Escherichia coli 0-157:H7 
8315 strain EDL933]; gi | 341 4899 j gb j aaC3 1 5 10 . 1 j ; similar to SepQ 
[Escherichia coli strain E2348/69] gi | 2865293 | gb | aaC38386.1 j 
(93% identity in 305 amino acids) 

SEQ ID NO: 713 : -0.647059, 86, novel, similar to Orfl6 
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[Escherichia coli strain E2348/69] gi I 2865292 | gb | aaC38385. 1 | 
8320 (97% identity in 138 amino acids); L0032 [Escherichia coli 
0-157:H7 strain EDL933] gi | 3414900 | gb | aaC31 511.1 I (100% 
identity in 91 amino acids) 

SEQ ID NO: 714: -0.245946, 149, novel, identical to L0033 
[Escherichia coli 0-157:H7 strain EDL933] 

8325 gi | 3414901 | gb | aaC31512.1 | 

SEQ ID NO: 715: -0.574667, 76, a type III secretion system 
protein EscN, identical to EscN (L00349 [Escherichia coli 
0-157:H7 strain EDL933] gi I 3414902 | gb | aaC31513.1 1 
SEQ ID NO: 716: -0.092157, 103, a type III secretion system 

8330 EscV protein, identical to EscV (L0035) [Escherichia coli 
0-157:H7 strain EDL933] gi I 3414903 | gb | aaC31514.1 j 
SEQ ID NO: 717: -0.296875, 97, novel, identical to Orfl2 
[Escherichia coli strain E2348/69] gi | 2865288 | gb | aaC38381.1 | I 
L0036 [Escherichia coli 0-157:117 strainEDL933] 

8335 gi | 3414904 ! gb ! aaC31515.1 | 

SEQ ID NO: 718: -0.570466, 194, identical to type III secretion 
system SepZ protein, SepZ proteins for 

example , [Escherichia coli 0-157: H 7 strain 

ED L933] gi | 341 4905 | gb | aaC31516.1 | 

8340 SEQ ID NO: 719: -0.367148, 555, novel, identical to L0038 
[Escherichia coli 0"157:H7 strain EDL933] 

gi I 3414906 ! gb j aaC31517.1 j ; similar to rOrfS [Escherichia coli 
E2348/69] gi I 2865287 | gb | aaC38380.1 | (92% identity in 142 
amino acids) 

8345 SEQ ID NO: 720: -0.356102, 509, a type III secretion system 
EscJ protein, identical to EscJ [Escherichia eoli strain 
E2348/69] gi j 2865286 j gb | aaC38379.1 ! ; L0039 (EscJ) 
[Escherichia coli 0-157:H7 strain EDL933] 

gi ! 3414907 j gb j aaC31518.1 | 

8350 SEQ ID NO: 721: -0.399319, 442, a type III secretion system 
proteinepD, identical to SepD (L0040) [Escherichia coli 
0-157:117 strain EDL933] gi I 3414908 | gb | aaC31519.1 | ; similar 
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to SepD proteins for example .[Escherichia coli strain 
E2348/69] gi | 886476 | emb | Caa90273.1 | (98% identity in 151 

8355 amino acids) 

SEQ ID NO: 722: -0.538854, 158, a type III secretion system 
EscC protein, identical to EscC (L0041) [Escherichia coli 
0-157:H7 strain EDL933] gi I 3414909 | gb | aaC31520.1 ! 
SEQ ID NO: 723: -0.272994, 375, a type III secretion system 

8360 CesD protein, identical to CesD (L0042) [Escherichia coli 
0-157:H7 strain EDL933] gi I 3414910 | gb | aaC31521.1 ! 
SEQ ID NO: 724: -0.223492, 316, novel, identical to L0043 
[Escherichia coli 0-157:H7 strain EDL933] 

gi | 3414911 | gb | aaC3 1522.1 | ; similar to Orfll [Escherichia coli 

8365 strain E2348/69] gi | 2865282 | gb | aaC38375.1 | (98% identity in 
137 amino acids) 

SEQ ID NO: 725: -0.455469, 129, novel, identical to L0044 
[Escherichia coli 0-157:H7 strain EDL933] 

gi I 3414912 | gb | aaC31523.1 | ; similar to OrflO [Escherichia coli 
8370 strain E2348/69] gi | 2865281 | gb | aaC38374.1 | (98% identity in 
123 amino acids) 

SEQ ID NO: 726: -0.330216, 140, novel, identical to L0045 
[Escherichia coli 0-157:H7 strain EDL933] 

gi | 3414913 I gb I a a 03 152 4.1 | ; similar to rOrf3 [Escherichia coli 
8375 strain E2348/69] gi | 2865280 | gb | aaC38373.1 | (98% identity in 
152 amino acids) 

SEQ ID NO: 727: -0,154301, 187, a type III secretion system 
EscU protein, identical to EscU (L0046) [Escherichia coli 
0-157:H7 strain EDL933] gi | 3414914 | gb | aaC31525.1 | 

8380 SEQ ID NO: 728: -0.764198, 82, a type III secretion system. 

EscT protein, identical to EscT (L0047) [Escherichia coli 
0-157:H7 strain EDL933] gi | 3414915 | gb | aaC31526.1 | 
SEQ ID NO: 729: -0.1374, 501, a type III secretion system EscS 
protein, identical to EscS (L0048) [Escherichia coli O - 1 5 7 : H 7 

8385 strain EDL933] gi I 34 14916 | gb | aaC3 1 52 7 . 1 j 

SEQ ID NO: 730: -0.500827, 122, a type III secretion system 
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EscR protein, identical to EscR (L0049) [Escherichia coli 
0-157:H7 strain EDL933] gi I 3414917 | gb | aaC31528.1 I 
SEQ ID NO: 731: -0.213291, 159, novel, identical to L0050 
8390 [Escherichia coli 0-157:H7 strain EDL933] 

gi | 3414918 | gb | aaC31529.1. | ; similar to Orf5 [Escherichia coli 
strain E2348/69] gi | 2865275 | gb | aaC38368.1 | (98% identity in 
231 amino acids) 

SEQ ID NO: 732: -0.205065, 692, novel, identical to L0051 
8395 [Escherichia coli 0-157:H7 strain EDL933] 

gi | 3414919 ! gb j aaC3 1530.1 j ; similar to 0rf4 [Escherichia coli 
strain E2348/69] gi I 2865274 | gb | aaC38367.1 | (99% identity in 
199 amino acids) 

SEQ ID NO: 733 : -0.131141, 457, novel, identical to Orf3 
8400 [Escherichia coli E2348/69]gi | 2865273 | gb | aaC38366. 1 | ; LOO 5 2 
[Escherichia coli 0-157:H7 strain EDL933] 

gi | 3414920 j gb j aaC31531.1 I 

SEQ ID NO: 734 : -0.277807, 375, novel, similar to Orf2 
[Escherichia coli strain E2348/69] gi | 2865272 | gb | aaC38365.1 | 
8405 (98% identity in 72 amino acids); L0053 [Escherichia coli 
0-157:H7 strain EDL933I gi | 3414921 | gb | aaC3 1532.1 j (98% 
identity in 72 amino acids) 

SEQ ID NO: 735: -0.335 784, 205, a transcription regulatory 
element, identical to L0054 [Escherichia coli 0-157:H7 strain 
8410 EDL933] gi I 3414922 j gb j aaC31533.1 | I similar to Orfl Ler 
[Escherichia coli strain E2348/69] gi | 2865271. | gb | aa C38364.1 | 
(99% identity in 129 amino acids) 
SEQ ID NO: 736: -0.142069, 146, novel 

SEQ ID NO: 737: -0.199169, 362, a secreted effector protein, 
8415 identical to L0055 [Escherichia coli 0-157:H7 strain EDL933] 
gi ! 3414923 ! gb j aaC31534.1 | ; similar to rOrf2 EspG 
[Escherichia coli strain E2348/69] gi I 2865270 | gb | aaC38363.1 | 
(97% identity in 398 amino acids) 

SEQ ID NO: 738: -0.374731, 187, novel, identical to L0056 
8420 [Escherichia coli Q-157:H7 strain EDL933] 
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gi | 341492 4 j gb j aaC3 1535. 1 j ; similar to rOrfl [Escherichia 
colistrain E2348/69] gi j 2865269 | gb | aaC38362. 1 | (99% identity 
in 272 amino acids) 

SEQ ID NO: 739: -0.368977, 304, novel, TTG start 

8425 SEQ ID NO: 740 : -0.53815,174, novel 

SEQ ID NO: 741: -0.097015, 68, novel, similar to hypothetical 
proteins for example ,NMA0565 [Neisseria meningitidis] 
gi | 7379302 j emb j CAB83857.1 (35% identity in 118 amino acids) 
SEQ ID NO: 742: -0.458602, 187, novel 

843 0 SEQ ID NO: 74 3: -0.278645, 680, a putative transcriptional [sic, 
translationalj regulator , similar to transcriptional I sic, 
translationalj regulators for example ,Com protein 
( transcriptional [sic, translationalj regulator of Mom) 
[Bacteriophage Mu] gi | 7388376 | sp j Q53979 I VCOM#SHIDY(46% 

8435 identity in 57 amino acids) 

SEQ ID NO: 744: 0.096667, 61, a putative DNA modification 
protein, similar to DNA modification proteins for 
example ,Mom protein [Bacteriophage Mu] 

gi | 138782 j sp | P06018 I VMOM#BPMU (76% identity in 245 

8440 amino acids), GTG start 

SEQ ID NO: 745 : -0,679012, 82, a sorbose-permease I ID 
component(PTS system), similar to sorbose -permease I ID 
components for example , [Klebsiella pneumoniae] 

gi | 548634 j sp | P37083 j PTRD#KLEPN (92% identity in 64 amino 

8445 acids), interrupted byphage-insertion 

SEQ ID NO: 746 : -0.529126, 104, a sorbose-permease IIC 
component (PTS system), similar to sorbose-permease IIC 
components for example , [Klebsiella pneumoniae] 

gi j 548633 | sp | P37082 | PTRC#KLEPN (92% identity in 265 

8450 amino acids) 

SEQ ID NO: 747 : -0.937879, 67, a sorbose-permease IIB 
component (PTS system), similar to sorbose-permease IIB 
components for example , [Klebsiella pneumoniae] 

gi | 1142714 | gb | aaB04152.1 | (46% identity in 162 amino acids) 
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8455 SEQ ID NO: 748: -0.563673, 246, a putative sorbose-permease 
II A component (PTS system), similar to sorbose-permease IIA 
components, for example , [Klebsiella pneumoniae] 

gi j 548631 I sp | P37080 | PTRA#KLEPN (71% identity in 135 
amino acids) 

8460 SEQ ID NO: 749 : -0.055385, 66, a sorbitol-6-phosphate 
2-dehydrogenase, similar to sorbitol- 6-phosphate 

2-dehydrogenases for example , [Klebsiella pneumoniae] 
gi | 548951 1 sp | P37079 j SORD#KLEPN (86% identity in 268 
amino acids) 

8465 SEQ ID NO: 750: 0.997359, 266, a putative sorbitol operon 
regulatory element (activator), similar to sorbitol operon 
regulatory element (SorC family) for example , [ Klebsiella 
pneumoniae] gi | 548950 | sp | P37078 | SORC#KLEPN (86% 
identity in 315 amino acids) 

8470 SEQ ID NO: 751: -0.115244, 165, a putative regulatory protein, 
similar to regulatory proteins for example , aerobic respiration 
control protein [Zymomonas m obi lis] 

gi | 4511977 | gb ] aaD21537.1 j (39% identity in 230 amino acids) 
SEQ ID NO: 752 : 0.19037, 136, a putative sugar kinase, 

8475 similar to sugar kinases for example .frueto kinase homolog 
ydjE [Bacillus subtilis] gi j 3915420 j sp ! 034768 ! YDJE#BACSU 
(24% identity in 326 amino acids) 

SEQ ID NO: 753: -0.159702, 269, a putative aldolase, similar 
to aldolases for example ,fructose-bisphosph.ate aldolase (EC 
8480 4.1.2.13) Fbaa [Bacillus subtilis] 

gi I 543796 | sp | P13243 i ALF1#BACSU (41% identity in 286 
amino acids) 

SEQ ID NO: 754: -0.218413, 316, novel, similar to (at low- 
level) a part of hypothetical protein ydaE [Bacillus subtilis] 
8485 gi | 7474928 j pir | | E69768 (35% identity in 51 amino acids) 

SEQ ID NO' 1322 : 0.197872, 236, a putative 

carbohydratebinding protein, similar to C-terminal part of 
carbohydratebinding proteins for example , bifunctional 
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carbohydrate binding and transporter protein [Streptomyces 
8490 coelicolor A3(2)] gi | 6714794 | emb | CAB66286.1 | (35% identity 
in 304 amino acids); ribose ABC transporter (ribose-binding 
protein) rbsB [Bacillus subtilis] 

gi | 6174949 | sp | P36949 | RBS B#BACSU(36% identity in 261 
amino acids) 

8495 SEQ ID NO: 1323: -0.163964, 334, a putative carbohydrate 
ABC transporter (permease), similar to carbohydrate ABC 
transporters (permease) for example , ribose ABC transporter 
(permease) rbsC [Bacillus subtilis] gi | 7446897 | pir | | B69690 
(43% identity in 317 amino acids) 

8500 SEQ ID NO: 1324 : 0.066434, 287, a putative sugar ABC 
transporter, ATP-binding protein, similar to sugar ABC 
transporter, ATP-binding proteins for example ,riboseABC 
transporter (ATP-binding protein) rbsA [Bacillus subtilis] 
gi I 7404442 j sp I P36947 | RBSA#BACSU (45% identity in 489 

8505 amino acids) 

SEQ ID NO: 1325 : -0.440969, 228, a putative histidine 
protein kinase, similar to histidine proteinkinase for 
example , histidine protein kinase -response regulator hybrid 
protein CvgSY [Pseudomonas syringae pv. syringaej 

8510 gi | 5 0 1 97 71 | gb | a a D 37857.1 I AF 1 3 3 2 6 3#2 (43% identity in 364 
amino acids) 

SEQ ID NO: 1326: -0.003195, 314, a putative transposase, 
similar to transposase homologA [Helicobacter pylori] 
gi | 2114470 | gb | aaD11513.1 (60% identity in 137 amino acids) 
8515 SEQ ID NO: 1327: 1.026235, 325, a putative transposase, 
similar to B1432#ECOLI gi j 1787702 (96% identity in 402 amino 
acids); transposases for example ,ORFB [Xylella fastidiosa] 
gi j 9105393 | gb | aaF83346.1 | AE003901#9 (38% identity in 321 
amino acids) 

8520 SEQ ID NO: 1328 : -0.04664, 507, a putative integrase, 
similar to(at low level) integrases for example , integrase 
[Bacteriophage TPW22] 
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gi | 6465908 j gb j aaF12706.1 | AF066865#4 (23% identity in 342 
amino acids) 

8525 SEQ ID NO: - : -0.010053, 757, identical to transposase 
(insertion sequence IS629),gi | 7444868 I pir | | T00241 
SEQ ID NO: 1.820 : -0.25035, 144, identical to transposase 
(insertion sequence I S629), [Escherichia coli plasmid p 0-1573 
gi | 7443882 | pir | | T00240 

8530 SEQ ID NO: 1621 : -0.587696,383, novel 

SEQ ID NO: 1310: -0.455932, 650, novel, TTG start 

SEQ ID NO: 1311 : -0.965741, 109, novel, TTG start 

SEQ ID NO: 1312: -0.397973, 297, novel, similar to(at low 

level) hypothetical proteins [Staphylococcus aureus! for 

8535 example ,gi | 7594765 | dbj | Baa94663.1 | (30% identity in 143 
amino acids); hypothetical protein [Neisseria meningitidis] 
gi | 5051461 ! emb j CAB44981.1 | (28% identity in 140 amino 
acids) 

SEQ ID NO: 1313: -0.511702, 95, a putative resolvase, similar 
8540 to resolvases for example /resolvase [Escherichia coli 
transposon Tn250l] gi [ 135944 j sp ! P05823 i TNP0#ECOLI(45% 
identity in 179 amino acids) 
[0029] 

12) Other proteins 

8 5 4 5 - ( ] j i. ii^ii dibits The number of amino 

acids, Cha vactc i' .-■■\rh function 

SEQ ID NO: 1314:0.037273, 111, putative transposase, similar 
to C-terminal part of transposase s, for example, [Escherichia 
coli Tn5l gi | 622948 | gb | aaB60064.1 | , maybe disrupted 
8550 SEQ ID NO: 1315: -0.213793, 59, novel, similar to a part of 
KfaE protein [Escherichia coli] gi I 628752 j pir j j S45104 (55% 
identity in 52 amino acids) 

SEQ ID NO: 1316 : -0.256129, 158, a putative enterotoxin, 
similar to ShET2 enterotoxin [Shigella flexneri] 
8555 gi j 1109754 | emb | Caa90938.1 j (38% identity in 539 amino 
acids) ; similar to a part of hypothetical protein, for example, 
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ankyri-like regulatory protein [Escherichia coli] 
gi j 418526 | sp | P23325 | ARP#ECOLI (28% identity in 172 amino 
acids) (at low level) 
8560 SEQ ID NO: 1317: -0.050262, 192, novel, similar to sB protein, 
for example, [insertion element iso-ISlN] 

gi | 124919 | sp | P03832 | ISBN#SHIDY (69% identity in 49 amino 
acids), TTG start 

SEQ ID NO: 1318 : -0.438356, 439, novel, similar to a 
8565 hypothetical protein [Salmonella typhi.mu.rium] 

gi I 6960367 I gb | aaF33527.1 | (63% identity in 306 amino acids) 
SEQ ID NO: 1319: -0.524125, 258, novel 

SEQ ID NO: 1320 : -0.435714, 155, novel, similar to a 
hypothetical protein in insertion elements, for example, [ I S 8 3 0 I 
8570 gi | 140943 | sp | PI 6943 | YIS5#SHISO (88% identity in 282 amino 
acids) 

SEQ ID NO: 1014: -0.510181, 276, a putative adherence factor, 
similar to N-terminal part of adherence factors (amino acids at 
the position 1-433/3223), for example, Efal [Escherichia coli 

8575 Olli:H- strain E45035] 

gi I 6013469 i gb I aaD49229.2 j AF159462#1 (99% identity in 433 
amino acids), probably interrupted by frameshift 
SEQ ID NO: 1015 : -0.496819, 284, a putative DNA-binding 
protein, similar to putative DNA-binding protein [Neisseria 

8580 meningitidis] gi | 7379301 | emb | CAB83856.1 (47% identity in 
101 amino acids) 

SEQ ID NO: 1016: -0.412037, 109, novel 

SEQ ID NO: 1017 : -0.505722, 368, a putative transcription 
regulatory element, its N-terminal part is similar to 
8585 transcription regulatory elements, for example , BamH I control 
element [Bacillus amyloliquefaciens] 

gi j 116073 | sp | P23939 | CEBA#BACAM (47% identity in 68 
amino acids) 

SEQ ID NO: 1018: -0.409362, 236, an integrase, similar to 
8590 integrase, for example, [prophage P4j 
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gi ! 732036 1 sp | P39347 | INTB#ECOLI (74% identity in 236 amino 
acids) 

SEQ ID NO: 1019: -0.205818, 551, novel 

SEQ ID NO: 1020: -0.198657, 1118, novel, similar to a part of 
8595 hypothetical proteins, for example, Yj B | Escherichia colij 
gi | 7404491 ! sp j P39371 | YJHT#ECOLI (95% identity in 82 
amino acids), TTG start 

SEQ ID NO: 1021: -0.398339, 2105, novel 

SEQ ID NO: 1022: -0.508378, 944, novel, similar to putative 
8600 periplasmic protein [Campylobacterjejuni] 

gi I 6968066 ! emb j CAB75235.1 | (26% identity in 173 amino 
acids) (at low level) 

SEQ ID NO: 1023: -0.482301, 1645, novel (putative membrane 
protein), similar to a part of myosin heavychains, for example, 
8605 [Cyprinus carpio] gi | 2351223 | dbj | Baa22069.1 | (19% identity 
in 292 amino acids) (at low level) 

SEQ ID NO: 1024: -0.359727, 2114, novel, similar to a part of 
YjiT [Escherichia coli] gi | 732099 ! sp | P39391 ! Y JIT #13 COL I 
(27% identity in 239amino acids) (at low level), GTG start 

8610 SEQ ID NO: 1025: "0.345738, 705, novel, its X- terminal part is 
similar to N-terminal part of putative RNA helicase 
[Deinococcus radiodurans (strain Rl)] gi | 7473663 | pir | | B75633 
(29% identity in 291 amino acids)land its central part is 
similar to hypothetical YjiV protein [Escherichia coli] 

8615 gi | 2851665 | sp | P39393 | YJIV#ECOLI (28% identity in 491 
amino acids); a part of McrD protein [Escherichia coli] 
gi!2851619!sp|P27301|MCRD#ECOLI (39% identity in 131 
amino acids) 

SEQ ID NO: 1026: 0.04, 61, a putative ATP-dependent helicase, 
8620 similar to putative ATP-dependent helicases, for example, 
[Halobacterium sp. (strain NROl) plasmid pNRClOO] 
gi j 7484100 | pir | | TO 83 16 (26% identity in 597 amino acids) 
SEQ ID NO: 1027: -0.514474, 77, novel, similar to hypothetical 
proteins, for example, II 11 30 I Halobacterium sp. (strain NROl) 
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8625 plasmid pNRClOOj gi | 7484076 | pir | | T08313 (25% identity in 
508 amino acids); and possible restriction /modificationenzyme 
[Campylobacter jejuni] gi | 6968147 | emb | CAB72964. 1 j (24% 
identity in 414 amino acids) 

SEQ ID NO: 1028: -0.40375, 81, a putative RNA helicase, 
8630 similar to putative RNA helicases, for example, [Deinococcus 
radiodurans (strain Rl)] gi | 7473663 | pir | | B75633 (amino acids 
at the position 78-396) (31% identity in 318 amino acids); and 
(amino acids at the position 994-1708) (2 3% identity in 714 
amino acids) 

8635 SEQ ID NO: 1468: -0.351742, 1580, a putative DNA helicase, 
similar to DNA helicases. for example, putative DNA helicase 
H91#ORF529 [Mycoplasma pneumoniae! 

gi | 2495150 | sp | P75438 | YH91#MYCPN (24% identity in 455 
amino acids); and helicase IV [Escherichia coli] 

8640 gi | 146328 | gb j aaA23952.1 | (23% identity in 513 amino acids) 
SEQ ID NO: 1469: 0.14127, 64, novel, TTG start 
SEQ ID NO: 1470: -0.245455, 67, novel, similar to N -terminal 
part of putative membrane protein bl978 [Escherichia coli 
K-12] gi j 1736642 | dbj j Baal5799.1 j (58% identity in 46 amino 

8645 acids) 

SEQ ID NO: 1546: -0.622994, 736, novel 
SEQ ID NO: - : -0.059091, 89, novel 

SEQ ID NO: 1592: -0.298976, 294, novel, similar to N-terminal 
part of hypothetical proteins, for example, jhp0462 
8650 [Helicobacter pylori (strain J99)] gi | 7464730 | pir | | C71929 
(48% identity in 269 amino acids); and j h p 0 5 7 2 [Helicobacter 
pylori (strainJ99)] gi | 7464757 | pir | | H71914 (31% identity in 
282 amino acids) 

SEQ ID NO: 1593: -0.494832, 388, novel, similar to C-terminal 
865 5 part of hj^pothetical proteins, for example, jhp0462 
[Helicobacter pylori (strain J99)] gi ! 7464730 | pir | ! C71929 
(42% identity in 423 amino acids); and HP051 3 [Helicobacter 
pylori (strain26695)l gi | 7464291 j pir | IA64584 (44% identity in 
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42 3 amino acids) 

8660 SEQ ID NO: 1381 : -0.367123, 585, a type I restriction 
modification enzymeS subunit, similar to type I 
restriction-modification enzyme S sub units, for example, 
[Citrobacter freundii] pir | S06097 j (54% identity in 584 amino 
acids) 

8665 SEQ ID NO: 1382 : -0.413184, 494, a type I restriction 
modification enzymeM subunit, similar to type I restriction 
modification enzyme M subunit 8, for example, LEcoA system] 
gi | 421016 | pir | | A47200 (98% identity in 489 amino acids) 
SEQ ID NO: 1383 : -0.505062, 811, a type I 

8670 restriction-modification enzymeR subunit, similar to type I 
restriction-modification enzyme R subunits, for example, [EcoAj 
gi | 2121113 | pir | 1 141291 (99% identity in 810 amino acids) 
SEQ ID NO: 1384: -0.614894, 95, novel, similar to N-terminal 
part of hypothetical proteins, for example, [Helicobacter pylori] 

8675 gi | 7464531 j pir j I E64694 (36% identity in 87 amino acids) 

SEQ ID NO: 1385 : -0.442477, 453, novel, similar to 
hypothetical proteins, for example, [Streptomyces coelicolor 
A3(2)] gi | 7479715 | pir | | T35601 (22% identity in 379 amino 
acids) (at low level), TTG start 

8680 SEQ ID NO: 1689: -0.487222, 181, novel 
[0030] 

l) Pa f( i! 5 s i > \ , i< 5 v i , if on 

These proteins or polypeptides are selected from, a group 
comprising the following sequence list: SEQ ID NO: 163, SEQ 

8685 ID NO: 164, SEQ ID NO: 165, SEQ ID NO: 166, SEQ ID NO: 167, 
SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 170, SEQ ID 
NO: 171, SEQ ID NO: 172, SEQ ID NO: 173, SEQ ID NO: 174, 
SEQ ID NO: 175, SEQ. ID NO: 176, SEQ. ID NO: 177, SEQ. ID 
NO: 178, SEQ ID NO: 179, SEQ ID NO: 180, SEQ ID NO: 181, 

8690 SEQ ID NO: 182, SEQ ID NO: 183, SEQ ID NO: 184, SEQ ID 
NO: 185, SEQ ID NO: 186, SEQ ID NO: 187, SEQ. ID NO: 188, 
SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID NO: 191, SEQ ID 
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NO: 192, SEQ ID NO: 193, SEQ ID NO: 194, SEQ ID NO: 195, 
SEQ ID NO: 198, SEQ ID NO: 197, SEQ ID NO: 198, SEQ ID 

8695 NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQ ID NO: 202, 
SEQ ID NO: 203, SEQ ID NO: 204, SEQ ID NO: 205, SEQ ID 
NO: 206, SEQ ID NO: 207, SEQ ID NO: 208, SEQ ID NO: 209, 
SEQ ID NO: 210, SEQ ID NO: 211, SEQ ID NO: 212, SEQ ID 
NO: 213, SEQ ID NO: 214, SEQ ID NO: 215, SEQ ID NO: 216, 

8700 SEQ ID NO: 217, SEQ. ID NO: 218, SEQ ID NO: 219, SEQ ID 
NO: 220, SEQ ID NO: 221, SEQ ID NO: 222, SEQ ID NO: 223, 
SEQ ID NO: 224, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID 
NO: 227, SEQ ID NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, 
SEQ ID NO: 231, SEQ ID NO: 232, SEQ ID NO: 233, SEQ ID 

8705 NO: 234, SEQ ID NO: 235, SEQ ID NO: 236, SEQ ID NO: 237, 
SEQ ID NO: 238, SEQ ID NO: 2 39, SEQ ID NO: 240, SEQ. ID 
NO: 241, SEQ ID NO: 242, SEQ ID NO: 243, SEQ ID NO: 485, 
SEQ ID NO: 486, SEQ. ID NO: 487, SEQ ID NO: 488, SEQ ID 
NO: 489, SEQ ID NO: 490, SEQ ID NO: 491, SEQ ID NO: 492, 

8710 SEQ ID NO: 493, SEQ ID NO: 494, SEQ ID NO: 495, SEQ ID 
NO: 496, SEQ ID NO: 497, SEQ ID NO: 498, SEQ. ID NO: 499, 
SEQ ID NO: 500, SEQ ID NO: 501, SEQ ID NO: 502, SEQ ID 
NO: 503, SEQ ID NO: 504, SEQ ID NO: 505, SEQ, ID NO: 506, 
SEQ ID NO: 507, SEQ ID NO: 508, SEQ. ID NO: 509, SEQ. ID 

8715 NO: 510, SEQ ID NO: 511, SEQ ID NO: 512, SEQ ID NO: 513, 
SEQ ID NO: 514, SEQ. ID NO: 515, SEQ ID NO: 516, SEQ ID 
NO: 517, SEQ ID NO: 518, SEQ ID NO: 519, SEQ ID NO: 520, 
SEQ ID NO: 521, SEQ ID NO: 522, SEQ ID NO: 523, SEQ ID 
NO: 524, SEQ ID NO: 525, SEQ ID NO: 528, SEQ ID NO: 527, 

8720 SEQ ID NO: 528, SEQ ID NO: 529, SEQ ID NO: 530, SEQ ID 
NO: 531, SEQ ID NO: 532, SEQ ID NO: 533, SEQ ID NO: 534, 
SEQ ID NO: 535, SEQ. ID NO: 536, SEQ. ID NO: 537, SEQ. ID 
NO: 538, SEQ ID NO: 539, SEQ ID NO: 540, SEQ ID NO: 541, 
SEQ ID NO: 542, SEQ ID NO: 543, SEQ ID NO: 544, SEQ ID 

8725 NO: 545, SEQ ID NO: 546, SEQ ID NO: 547, SEQ. ID NO: 548, 
SEQ ID NO: 549, SEQ ID NO: 550, SEQ ID NO: 551, SEQ ID 
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NO: 552, SEQ ID NO: 553, SEQ ID NO: 631, SEQ ID NO: 632, 
SEQ ID NO: 633, SEQ ID NO: 634, SEQ ID NO: 635, SEQ ID 
NO: 636, SEQ ID NO: 637, SEQ ID NO: 638, SEQ ID NO: 639, 

8730 SEQ ID NO: 640, SEQ ID NO: 641, SEQ ID NO: 642, SEQ ID 
NO: 643, SEQ ID NO: 644, SEQ ID NO: 928, SEQ ID NO: 929, 
SEQ ID NO: 930, SEQ. ID NO: 931, SEQ ID NO: 932, SEQ ID 
NO: 933, SEQ ID NO: 934, SEQ ID NO: 935, SEQ ID NO: 936, 
SEQ ID NO: 937, SEQ ID NO: 938, SEQ ID NO: 939, SEQ ID 

8735 NO: 940, SEQ ID NO: 941, SEQ ID NO: 942, SEQ ID NO: 943, 
SEQ ID NO: 944, SEQ ID NO: 945, SEQ ID NO: 946, SEQ ID 
NO: 979, SEQ ID NO: 980, SEQ ID NO: 981, SEQ. ID NO: 982, 
SEQ ID NO: 983, SEQ ID NO: 984, SEQ ID NO: 985, SEQ ID 
NO: 986, SEQ ID NO: 987, SEQ ID NO: 988, SEQ ID NO: 989, 

8740 SEQ ID NO: 990, SEQ ID NO: 991, SEQ ID NO: 992, SEQ. ID 
NO: 993, SEQ ID NO: 994, SEQ ID NO: 995, SEQ ID NO: 996, 
SEQ ID NO: 997, SEQ. ID NO: 998, SEQ ID NO: 999, SEQ ID 
NO: 1000, SEQ ID NO: 1001, SEQ ID NO: 1002, SEQ ID NO: 
1003, SEQ ID NO: 1004, SEQ ID NO: 1005, SEQ ID NO: 1006, 

8745 SEQ ID NO: 1008, SEQ ID NO: 1009, SEQ ID NO: 1010, SEQ ID 
NO: 1011, SEQ ID NO: 1012, SEQ ID NO: 1056, SEQ ID NO: 
1057, SEQ ID NO: 1058, SEQ ID NO: 1059, SEQ ID NO: 1094, 
SEQ ID NO: 1095, SEQ ID NO: 1096, SEQ ID NO: 1097, SEQ ID 
NO: 1098, SEQ. ID NO: 1099, SEQ ID NO: 1100, SEQ ID NO: 

8750 1101, SEQ ID NO: 1102, SEQ ID NO: 1103, SEQ ID NO: 1104, 
SEQ ID NO: 1105, SEQID NO: 1106, SEQ ID NO: 1107, SEQ ID 
NO: 1108, SEQ ID NO: 1109, SEQ. ID NO: 1110, SEQ. ID NO: 
1111, SEQ ID NO: 1112, SEQ ID NO: 1113, SEQ. ID NO: 1114, 
SEQ ID NO: 1115, SEQ ID NO: 1116, SEQ ID NO: 1117, SEQ ID 

8755 NO: .1118, SEQ ID NO: 1119, SEQ. ID NO: 1120, SEQ ID NO: 
1121, SEQ. ID NO: 1122, SEQ ID NO: 1123, SEQ ID NO: 1124, 
SEQ ID NO: 1125, SEQ ID NO: 1126, SEQ ID NO: 1127, SEQ. ID 
NO: 1213, SEQ ID NO: 1214, SEQ ID NO: 1215, SEQ. ID NO: 
1216, SEQ ID NO: 1217, SEQ ID NO: 1218, SEQ. ID NO: 1219, 

8760 SEQ ID NO: 1220, SEQ. ID NO: 1221, SEQ ID NO: 1222, SEQ ID 
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NO: 1223, SEQ ID NO: 1224, SEQID NO: 1225, SEQ ID NO: 
1226, SEQ ID NO: 1227, SEQ ID NO: 1228, SEQ [ D N 0 : 1229, 
SEQ ID NO: 1230, SEQ ID NO: 1231, SEQ ID NO: 1232, SEQ ID 
NO: 1233, SEQ ID NO: 1234, SEQ. ID NO: 1235, SEQ ID NO: 

8765 1236, SEQ ID NO: 1237, SEQ ID NO: 1238, SEQ ID NO: 1239, 
SEQ ID NO: 1275, SEQ ID NO: 1276, SEQ ID NO: 1277, SEQ ID 
NO: 1278, SEQ ID NO: 1279, SEQ ID NO: 1280, SEQ IDNQ: 
1281, SEQ ID NO: 1282, SEQ ID NO: 1284, SEQ ID NO: 1285, 
SEQ ID NO:i286, SEQ ID NO: 1287, SEQ ID NO: 1303, SEQ ID 

87 7 0 NO: 1304, SEQ ID NO: 1305, SEQ ID NO: 1306, SEQ ID NO: 
1307, SEQ ID NO: 1308, SEQ ID NO: 1360, SEQ. ID NO: 1361, 
SEQ ID NO: 1362, SEQ ID NO: 1363, SEQ ID NO: 1364, SEQ ID 
NO: 1365, SEQ ID NO: 1387, SEQ ID NO: 1388, SEQ ID NO: 
1389, SEQ ID NO: 1390, SEQ ID NO: 1391, SEQ ID NO: 1392, 

8775 SEQ ID NO: 1393, SEQ ID NO: 1437, SEQ ID NO: 1438, SEQ. ID 
NO: 1439, SEQ ID NO: 1440, SEQ ID NO: 1441, SEQ ID NO: 
1442, SEQ, ID NO: 1451, SEQ ID NO: 1452, SEQ ID NO: 1453, 
SEQ ID NO: 1454, SEQ ID NO* 1455, SEQ ID NO: 1456, SEQ ID 
NO: 1474, SEQ ID NO: 1475, SEQ ID NO: 1476, SEQ. ID NO: 

8780 1479, SEQ ID NO: 1480, SEQ ID NO: 1481, SEQ. ID NO: 1482, 
SEQ ID NO: 14 83, SEQ ID NO: 1484, SEQ ID NO: 1485, SEQ ID 
NO: 1486, SEQ ID NO: 1495, SEQ ID NO: 1496, SEQ ID NO: 
1497, SEQID NO: 1498, SEQ ID NO: 1500, SEQ ID NO: 1502, 
SEQ ID NO: 1503, SEQ IDNO: 1504, SEQ ID NO: 1505, SEQ ID 

8785 NO: 1559, SEQ ID NO: 1560, SEQ ID NC):1561, SEQ ID NO: 
1562, SEQ ID NO: 1577, SEQ ID NO: 1578, SEQ ID NO: 1579, 
SEQ ID NO: 1602, SEQ. ID NO: 1606, SEQ ID NO: 1625, SEQ ID 
NO: 1663, SEQ ID NO: 1697, SEQ ID NO: 1698, SEQ ID NO: 
1702 and SEQ ID NO: 1703. These proteins or polypeptides 

8790 are psecific to 0-157:H7. Whereas no significant homology to 
all data registered in gene data bank is found from information 
of determined amino acid sequence, and their functions and the 
like are not known. However, as shown in table 1, a protein 
predicted to be a cell surface protein (membrane protein, 
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8795 especially, outer membrane protein (OMP), lipoprotein) in them 
or its gene (or nucleic-acid molecule) may be useful for 
production of an antibody, vaccine composition, diagnosis of 
0*157 infection and the like. Furthermore, there is a 
possibility that they include a protein which has an important 

8800 function in 0-157, for example, transportation and metabolism 
of a substance, processing of nucleic acids, and relates to a 
regulatory element and pathogenicity. They are to be useful 
for diagnosis and therapy of 0-157 infection. 
[0031] 

8 8 0 5 2) Proteins which have unknown function, but have significant 
homology to tb a t of ■-rh.-r bacteria.-. 

These proteins or polypeptides are selected from a group 
comprising the following sequence list: SEQ ID NO: 02, SEQ ID 
NO: 03, SEQ ID NO: 04, SEQ ID NO: 05, SEQ ID NO: 08, SEQ ID 

8810 NO: 07, SEQ ID NO: 08, SEQ ID NO'- 09, SEQ ID NO: 10, SEQ 
ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, 
SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, 
SEQ ID NO: 19, SEQ ID NO: 20, SEQID NO: 21, SEQ. ID NO: 22, 
SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, 

8815 SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30, 
SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 245, SEQ ID NO: 
246, SEQ. ID NO: 247, SEQ ID NO: 248, SEQ ID NO: 249, SEQ 
ID NO: 250, SEQ ID NO: 251, SEQ ID NO: 252, SEQ ID NO: 253, 
SEQ ID NO: 2 54, SEQ ID NO: 255, SEQ ID NO: 256, SEQ ID 

8820 NO: 257, SEQ ID NO: 258, SEQ ID NO: 259, SEQ. ID NO: 260, 
SEQ ID NO: 261, SEQ ID NO: 262, SEQ ID NO: 283, SEQ ID 
NO: 264, SEQ 1 1) NO: 285, SEQ ID NO: 266, SEQ ID NO: 267, 
SEQ ID NO: 268, SEQ ID NO: 270, SEQ ID NO: 271, SEQ ID NO: 
272, SEQ ID NO: 273, SEQ ID NO: 338, SEQ IDNQ: 339, SEQ 

8825 ID NO: 340, SEQ ID NO: 341, SEQ ID NO: 342, SEQ ID NO: 
343, SEQ ID NO: 344, SEQ ID NO: 345, SEQ ID NO: 346, SEQ 
ID NO: 347, SEQ. IDNQ: 348, SEQ. ID NO: 349, SEQ. ID NO: 350, 
SEQ ID NO: 351, SEQ ID NO: 352, SEQ ID NO: 353, SEQ ID NO: 
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354, SEQ ID NO: 355, SEQ ID NO: 356, SEQ IDNO: 357, SEQ 

8830 ID NO: 358, SEQ ID NO: 359, SEQ ID NO- 360, SEQ ID NO: 
3 61, SEQ ID NO: 362, SEQ ID NO: 363, SEQ ID NO: 364, SEQ 
ID NO: 365, SEQ. ID NO: 366, SEQ. ID NO: 367, SEQ ID NO: 368, 
SEQ ID NO: 369, SEQ ID NO: 370, SEQ ID NO: 371, SEQ ID NO: 
372, SEQ. ID NO: 373, SEQ ID NO: 374, SEQ ID NO: 375, SEQ 

8835 ID NO: 376, SEQ ID NO: 377, SEQ ID NO: 378, SEQ ID NO: 
379, SEQ ID NO: 380, SEQ ID NO: 381, SEQ ID NO: 382, SEQ 
ID NO: 383, SEQ ID NO: 384, SEQ ID NO: 385, SEQ ID NO: 386, 
SEQ ID NO: 387, SEQ ID NO: 388, SEQ ID NO: 389, SEQ ID NO: 
390, SEQ. ID NO: 391, SEQ. ID NO: 392, SEQ IDNO: 393, SEQ 

8840 ID NO: 394, SEQ ID NO: 395, SEQ. ID NO: 396, SEQ ID NO: 397, 
SEQ ID NO: 398, SEQ ID NO: 399, SEQ ID NO: 400, SEQ ID 
NO: 401, SEQ IDNO: 402, SEQ. ID NO: 403, SEQ ID NO: 404, 
SEQ ID NO: 405, SEQ ID NO: 406, SEQ. ID NO: 407, SEQ. ID 
NO: 408, SEQ ID NO: 409, SEQ. ID NO: 411, SEQ ID NO: 412, 

8845 SEQ ID NO: 413, SEQ ID NO: 414, SEQ ID NO: 416, SEQ ID 
NO: 417, SEQ ID NO: 418, SEQ ID NO: 419, SEQ ID NO: 420, 
SEQ ID NO: 421, SEQ ID NO: 422, SEQ ID NO: 423, SEQ ID 
NO: 424, SEQ ID NO: 425, SEQ ID NO: 426, SEQ. ID NO: 427, 
SEQ ID NO: 428, SEQ ID NO: 429, SEQ. ID NO: 430, SEQ ID 

8850 NO: 431, SEQ ID NO: 432, SEQ ID NO: 433, SEQ ID NO: 434, 
SEQ ID NO: 435, SEQ ID NO: 436, SEQ. ID NO: 437, SEQ. ID 
NO: 438, SEQ ID NO: 439, SEQ ID NO: 440, SEQ ID NO: 441, 
SEQ ID NO: 442, SEQ ID NO: 443, SEQ ID NO: 444, SEQ ID 
NO: 445, SEQ ID NO: 446, SEQ ID NO: 447, SEQ. ID NO: 448, 

8855 SEQ ID NO: 449, SEQ ID NO: 450, SEQ ID NO: 451, SEQ ID 
NO: 452, SEQ ID NO: 453, SEQ ID NO: 454, SEQ, ID NO: 455, 
SEQ ID NO: 456, SEQ ID NO: 457, SEQ. ID NO: 458, SEQ. ID 
NO: 459, SEQ ID NO: 460, SEQ ID NO: 461, SEQ ID NO: 462, 
SEQ ID NO: 463, SEQ. ID NO: 464, SEQ ID NO: 465, SEQ. ID 

8860 NO: 466, SEQ ID NO: 467, SEQ ID NO: 468, SEQ. ID NO: 469, 
SEQ ID NO: 470, SEQ ID NO: 471, SEQ ID NO: 472, SEQ ID 
NO: 473, SEQ ID NO: 474, SEQ ID NO: 475, SEQ. ID NO: 476, 
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SEQ ID NO: 477, SEQ. ID NO: 478, SEQ ID NO: 479, SEQ ID 
NO: 480, SEQ ID NO: 481, SEQ ID NO: 482, SEQ ID NO: 483, 

8865 SEQ ID NO: 645, SEQ ID NO: 646, SEQ ID NO: 647, SEQ ID 
NO: 648, SEQ ID NO: 649, SEQ ID NO: 650, SEQ. ID NO: 651, 
SEQ ID NO: 652, SEQ ID NO: 653, SEQ ID NO: 654, SEQ ID 
NO: 655, SEQ ID NO: 656, SEQ ID NO: 657, SEQ ID NO: 658, 
SEQ ID NO: 659, SEQ ID NO: 660, SEQ ID NO: 661, SEQ. ID 

8870 NO: 662, SEQ ID NO: 663, SEQ ID NO: 664, SEQ ID NO: 665, 
SEQ ID NO: 666, SEQ ID NO: 667, SEQ ID NO: 668, SEQ ID 
NO: 669, SEQ ID NO: 670, SEQ ID NO: 671, SEQ. ID NO: 672, 
SEQ ID NO: 673, SEQ ID NO: 674, SEQ ID NO: 675, SEQ ID 
NO: 676, SEQ ID NO: 677, SEQ ID NO: 678, SEQ. ID NO: 679, 

8875 SEQ ID NO: 680, SEQ ID NO: 681, SEQ ID NO: 682, SEQ ID 
NO: 683, SEQ ID NO: 684, SEQ ID NO: 685, SEQ ID NO: 686, 
SEQ ID NO: 687, SEQ ID NO: 688, SEQ. ID NO: 877, SEQ. ID 
NO: 878, SEQ ID NO: 879, SEQ ID NO: 880, SEQ ID NO: 881, 
SEQ ID NO: 882, SEQ ID NO: 883, SEQ ID NO: 884, SEQ ID 

8880 NO: 885, SEQ ID NO: 886, SEQ ID NO: 887, SEQ. ID NO: 888, 
SEQ ID NO: 889, SEQ ID NO: 890, SEQ ID NO: 891, SEQ ID 
NO: 892, SEQ ID NO: 893, SEQ ID NO: 894, SEQ. ID NO: 895, 
SEQ ID NO: 896, SEQ ID NO: 897, SEQ. ID NO: 898, SEQ ID 
NO: 899, SEQ ID NO: 900, SEQ ID NO: 901, SEQ ID NO: 902, 

8885 SEQ ID NO: 903, SEQ ID NO: 904, SEQ. ID NO: 905, SEQ. ID 
NO: 906, SEQ ID NO: 907, SEQ ID NO: 908, SEQ ID NO: 909, 
SEQ ID NO: 910, SEQ ID NO: 911, SEQ ID NO: 912, SEQ ID 
NO: 913, SEQ ID NO: 914, SEQ ID NO: 915, SEQ. ID NO: 916, 
SEQ ID NO: 917, SEQ ID NO: 918, SEQ ID NO: 919, SEQ ID 

8890 NO: 920, SEQ ID NO: 921, SEQ ID NO: 922, SEQ, ID NO: 923, 
SEQ ID NO: 924, SEQ. ID NO: 925, SEQ. ID NO: 926, SEQ. ID 
NO: 947, SEQ ID NO: 947, SEQ ID NO: 949, SEQ ID NO: 950, 
SEQ ID NO: 951, SEQ. ID NO: 952, SEQ ID NO: 953, SEQ. ID 
NO: 954, SEQ ID NO: 955, SEQ ID NO: 956, SEQ. ID NO: 957, 

8895 SEQ ID NO: 958, SEQ ID NO: 959, SEQ ID NO: 960, SEQ ID 
NO: 961, SEQ ID NO: 962, SEQ ID NO: 963, SEQ. ID NO: 964, 
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SEQ ID NO: 965, SEQ ID NO: 966, SEQ ID NO: 967, SEQ ID 
NO: 968, SEQ ID NO: 968, SEQ ID NO: 969, SEQ ID NO: 970, 
SEQ ID NO: 971, SEQ ID NO: 972, SEQ ID NO: 973, SEQ ID 

8900 NO: 1026, SEQ ID NO: 1027, SEQ ID NO: 1028, SEQ. ID NO: 
1375, SEQ ID NO: 1376, SEQ ID NO: 1377, SEQ ID NO: 1378, 
SEQ ID NO: 1379, SEQ ID NO: 1410, SEQ ID NO: 1419, SEQ ID 
NO: 1420, SEQ ID NO: 1421, SEQ ID NO: 1422, SEQ ID 
NQ:i423, SEQ ID NO: 1424, SEQ ID NO: 1425, SEQ ID NO: 

8905 1488, SEQ ID NO: 1517, SEQ ID NO: 1516, SEQ ID NO: 1517, 
SEQ ID NO: 1538, SEQ ID NO: 1539, SEQ ID NO: 1550, SEQ ID 
NO: 1567, SEQ ID NO: 1568, SEQ ID NO: 1608, SEQ ID NO: 
1609, SEQ ID NO: 1610, SEQ ID NO: 1611, SEQ ID NO: 1628, 
SEQ ID NO: 1633, SEQ ID NO: 1634, SEQ ID NO: 1641, SEQ ID 

8910 NO: 1642, SEQ ID NO: 1644, SEQ ID NO: 1645, SEQ ID NO: 
1665, SEQ ID NO: 1676, and SEQ ID NO: 1681. These proteins 
or polypeptides are specific to 0-157:117, and significant 
homology to all data registered in gene data bank is found from 
determined information of amino acid sequence. Whereas, 

8915 their functions and the like are not known. However, as shown 
in table 1. a protein predicted to be a cell surface protein 
(membrane protein, especially, OMP, lipoprotein) in them or its 
gene (or nucleic -acid molecule) may be useful for production of 
an antibody, vaccine composition, diagnosis of 0-157 infection 

8920 and the like. Furthermore, there is a possibility that they 
include a protein which has an important function in 0-157, for 
example, transportation and metabolism of a substance, 
processing of nucleic acids, and relates to a regulatory element 
and pathogenicity. They are to be useful for diagnosis and 

8925 therapy of 0-157 infection. 
[0032] 

3) Proteins comprising Insertion Sequence (IS) 

These proteins or polypeptides are selected from a group 
comprising the following sequence list: SEQ ID NO: 133, SEQ 
8930 ID NO: 134, SEQ. ID NO: 135, SEQ ID NO: 136, SEQ ID NO: 137, 
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SEQ ID NO: 138, SEQ. ID NO: 139, SEQ ID NO: 140, SEQ ID 
NO: 141, SEQ ID NO: 142, SEQ ID NO: 143, SEQ ID NO: 144, 
SEQ ID NO: 145, SEQ ID NO: 146, SEQ ID NO: 147, SEQ ID 
NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, SEQ. ID NO: 151, 

8935 SEQ ID NO: 152, SEQ ID NO: 153, SEQ ID NO: 154, SEQ ID 
NO: 155, SEQ ID NO: 156, SEQ ID NO: 157, SEQ ID NO: 158, 
SEQ ID NO: 159, SEQ ID NO: 160, SEQ ID NO: 161, SEQ. ID 
NO: 162, SEQ ID NO: 279, SEQ ID NO: 280, SEQ ID NO: 281, 
SEQ ID NO: 2 82, SEQ ID NO: 283, SEQ ID NO: 2 84, SEQ ID 

8940 NO: 285, SEQ ID NO: 286, SEQ ID NO: 287, SEQ. ID NO: 288, 
SEQ ID NO: 289, SEQ ID NO: 290, SEQ ID NO: 291, SEQ ID 
NO: 292, SEQ ID NO: 293, SEQ ID NO: 294, SEQ ID NO: 295, 
SEQ ID NO: 296, SEQ ID NO: 297, SEQ ID NO: 298, SEQ ID 
NO: 299, SEQ ID NO: 300, SEQ ID NO: 301, SEQ ID NO: 302, 

8945 SEQ ID NO: 303, SEQ. ID NO: 304, SEQ ID NO: 305, SEQ. ID 
NO: 306, SEQ ID NO: 307, SEQ ID NO: 308, SEQ ID NO: 309, 
SEQ ID NO: 310, SEQ ID NO: 311, SEQ ID NO: 312, SEQ ID 
NO: 313, SEQ ID NO: 314, SEQ ID NO: 315, SEQ ID NO: 316, 
SEQ ID NO: 317, SEQ ID NO: 318, SEQ ID NO: 319, SEQ ID 

8950 NO: 320, SEQ ID NO: 321, SEQ ID NO: 322, SEQ ID NO: 323, 
SEQ ID NO: 324, SEQ ID NO: 325, SEQ ID NO: 326, SEQ ID 
NO: 327, SEQ ID NO: 328, SEQ ID NO: 329, SEQ ID NO: 330, 
SEQ ID NO: 331, SEQ ID NO: 332, SEQ. ID NO: 333, SEQ. ID 
NO: 334, SEQ ID NO: 335, SEQ ID NO: 336, SEQ ID NO: 1030, 

8955 SEQ ID NO: 1031, SEQ ID NO: 1032, SEQ ID NO: 1033, SEQ ID 
NO: 1034, SEQ ID NO: 1035, SEQ ID NO: 1036, SEQ. ID NO: 
1037, SEQ ID NO: 1038, SEQ ID NO: 1039, SEQ ID NO: 1040, 
SEQ ID NO: 1041, SEQ ID NO: 1042, SEQ ID NO: 1043, SEQ ID 
NO:i()44, SEQ ID NO: 1045, SEQ ID NO: 1046, SEQ. ID NO: 

8960 1047, SEQ. ID NO: 1048, SEQ ID NO: 1049, SEQ ID NO: 1050, 
SEQ ID NO: 1051, SEQ ID NO: 1052, SEQ ID NO: 1053, SEQ. ID 
NO: 1054, and SEQ ID NO: 1570. These proteins and their 
genes (or nucleic-acid molecules) are useful for detection and 
diagnosis of 0-157 infection. 
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8965 [0033] 

4) Prot on- ' ') ig phage- 

These proteins or polypeptides are selected from a group 
comprising the following sequence list: SEQ ID NO: 33, SEQ ID 





NO: 


34, 


SEQ ID 


NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ 


8970 


ID NO: 


38, 


SEQ ID NO: 39, SEQ ID NO: 40, SEQ ID NO: 41, 




SEQ 


ID 


NO: 


42, 


SEQ. ID NO 


43, SEQ ID NO 


44, SEQ ID NO: 45, 




SEQ 


ID 


NO: 


46, 


SEQ ID NO 


47, SEQ ID NO 


48, SEQ ID NO: 49, 




SEQ 


ID 


NO: 


50, 


SEQ ID NO 


51, SEQID NO: 


52, SEQ ID NO: 53, 




SEQ 


ID 


NO: 


54, 


SEQ ID NO 


55, SEQ ID NO 


56, SEQ. ID NO: 57, 


8975 


SEQ 


il) 


NO: 


58, 


SEQ ID NO: 59, SEQ ID NO 


60, SEQ ID NO: 61, 




SEQ 


ID 


NO: 


62, 


SEQ ID NO 


63, SEQ ID NO 


64, SEQ ID NO: 65, 




SEQ 


ID 


NO: 


66, 


SEQ ID NO: 67, SEQ ID NO 


68, SEQ ID NO: 69, 




SEQ 


ID 


NO: 


70, 


SEQ IDNO: 


71, SEQ ID NO: 


72, SEQ ID NO: 73, 




SEQ 


ID 


NO: 


74, 


SEQ ID NO 


75, SEQ ID NO 


76, SEQ, ID NO: 77, 


8980 


SEQ 


ID 


NO: 


78, 


SEQ ID NO 


79, SEQ. ID NO 


80, SEQ ID NO: 81, 




SEQ 


ID 


NO: 


82, 


SEQ ID NO 


83, SEQ ID NO 


84, SEQ ID NO: 85, 




SEQ 


ID 


NO: 


86, 


SEQ ID NO 


87, SEQ ID NO 


88, SEQ ID NO: 89, 




SEQ 


ID 


NO: 


90, 


SEQ ID NO 


91, SEQ ID NO 


92, SEQ ID NO: 93, 




SEQ 


ID 


NO: 


94, 


SEQ ID NO 


95, SEQ ID NO 


96, SEQ ID NO: 97, 


8985 


SEQ 


ID 


NO: 


98, 


SEQ ID NO: 99, SEQ ID N( 


): 100, SEQ ID NO: 



101, SEQ. ID NO: 102, SEQ ID NO: 103, SEQ ID NO: 104, SEQ 
ID NO: 105, SEQ ID NO: 106, SEQ ID NO: 107, SEQ ID NO: 108, 
SEQ ID NO: 109, SEQ ID NO: 110, SEQ ID NO: 111, SEQ ID 
NO: 112, SEQ, ID NO: 113, SEQ ID NO: 114, SEQ ID NO: 115, 

8990 SEQ ID NO: 116, SEQ ID NO: 117, SEQ ID NO: 118, SEQ ID 
NO: 119, SEQ ID NO: 120, SEQ. ID NO: 121, SEQ. ID NO: 122, 
SEQ ID NO: 123, SEQ ID NO: 124, SEQ ID NO: 125, SEQ ID 
NO: 126, SEQ ID NO: 127, SEQ ID NO: 128, SEQ ID NO: 129, 
SEQ ID NO: 130, SEQ. ID NO: 131, SEQ. ID NO: 555, SEQ. ID 

8995 NO: 556, SEQ ID NO: 557, SEQ ID NO: 558, SEQ ID NO: 559, 
SEQ ID NO: 560, SEQ ID NO: 561, SEQ ID NO: 562, SEQ ID 
NO: 563, SEQ ID NO: 564, SEQ ID NO: 565, SEQ. ID NO: 566, 
SEQ ID NO: 567, SEQ ID NO: 568, SEQ ID NO: 569, SEQ ID 
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NO: 570, SEQ ID NO: 571, SEQ ID NO: 572, SEQ ID NO: 573, 

9000 SEQ ID NO: 574, SEQ ID NO: 575, SEQ ID NO: 576, SEQ ID 
NO: 577, SEQ ID NO: 578, SEQ ID NO: 579, SEQ ID NO: 580, 
SEQ ID NO: 581, SEQ ID NO: 582, SEQ ID NO: 583, SEQ ID 
NO: 584, SEQ ID NO: 585, SEQ ID NO: 586, SEQ ID NO: 587, 
SEQ ID NO: 588, SEQ. ID NO: 589, SEQ ID NO :590, SEQ ID 

9005 NO: 591, SEQ ID NO: 592, SEQ ID NO: 593, SEQ ID NO: 594, 
SEQ ID NO: 595, SEQ ID NO: 596, SEQ ID NO: 597, SEQ ID 
NO: 598, SEQ ID NO: 599, SEQ ID NO: 600, SEQ ID NO: 601, 
SEQ ID NO: 602, SEQ ID NO: 603, SEQ ID NO: 604, SEQ ID 
NO: 605, SEQ ID NO: 606, SEQ ID NO: 607, SEQ. ID NO: 608, 

9010 SEQ ID NO: 609, SEQ ID NO: 610, SEQ ID NO: 611, SEQ ID 
NO: 612, SEQ ID NO: 613, SEQ ID NO: 614, SEQ ID NO: 615, 
SEQ ID NO: 616, SEQ ID NO: 617, SEQ ID NO: 618, SEQ. ID 
NO: 619, SEQ ID NO: 620, SEQ ID NO: 621, SEQ ID NO: 622, 
SEQ ID NO: 623, SEQ. ID NO: 624, SEQ ID NO: 625, SEQ ID 

9015 NO: 626, SEQ ID NO: 627, SEQ ID NO: 628, SEQ ID NO: 629, 
SEQ ID NO: 756, SEQ ID NO: 757, SEQ ID NO: 758, SEQ ID 
NO: 759, SEQ ID NO: 760, SEQ ID NO: 761, SEQ. ID NO: 762, 
SEQ ID NO: 763, SEQ ID NO: 764, SEQ ID NO: 765, SEQ ID 
NO: 766, SEQ ID NO: 767, SEQ ID NO: 768, SEQ, ID NO: 769, 

9020 SEQ ID NO: 770, SEQ ID NO: 771, SEQ. ID NO: 772, SEQ. ID 
NO: 773, SEQ ID NO: 774, SEQ. ID NO: 775, SEQ ID NO: 776, 
SEQ ID NO: 777, SEQ ID NO: 778, SEQ ID NO: 779, SEQ ID 
NO: 780, SEQ ID NO: 781, SEQ ID NO: 782, SEQ ID NO: 783, 
SEQ ID NO: 784, SEQ ID NO: 785, SEQ ID NO: 786, SEQ ID 

9025 NO: 787, SEQ ID NO: 788, SEQ ID NO: 789, SEQ ID NO: 790, 
SEQ ID NO: 791, SEQ ID NO: 792, SEQ ID NO: 793, SEQ ID 
NO: 794, SEQ ID NO: 795, SEQ. ID NO: 796, SEQ ID NO: 797, 
SEQ ID NO: 798, SEQ. ID NO: 799, SEQ. ID NO: 800, SEQ. ID 
NO: 801, SEQ ID NO: 802, SEQ ID NO: 803, SEQ ID NO: 804, 

9030 SEQ ID NO: 805, SEQ ID NO: 806, SEQ ID NO: 807, SEQ ID 
NO: 808, SEQ ID NO: 809, SEQ ID NO: 810, SEQ ID NO: 811, 
SEQ ID NO: 812, SEQ ID NO: 813, SEQ ID NO: 814, SEQ ID 
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NO:815, SEQ ID NO: 1061, SEQ ID NO: 1062, SEQ. ID NO: 1063, 
SEQ ID NO: 1064, SEQ ID NO: 1065, SEQ ID NO: 1066, SEQ ID 

9035 NO: 1067, SEQ ID NO: 1068, SEQ ID NO: 1069, SEQ ID NO: 
1070, SEQ ID NO: 1071, SEQ ID NO: 1072, SEQ. ID NO: 1073, 
SEQ ID NO: 1074, SEQ ID NO: 1075, SEQ ID NO: 1076, SEQ ID 
NO:i()77, SEQ. ID NO: 1078, SEQ ID NO: 1079, SEQ. ID NO: 
1080, SEQ. ID NO: 1081, SEQ ID NO: 1082, SEQ ID NO: 1083, 

9040 SEQ ID NO: 1084, SEQ ID NO: 1085, SEQ ID NO: 1086, SEQ ID 
NO: 1087, SEQ ID NO: 1088, SEQ ID NO: 1089, SEQ ID NO: 
1090, SEQ ID NO: 1091, SEQ ID NO: 1092, SEQ ID NO: 1158, 
SEQ ID NO: 1159, SEQ. ID NO: 1160, SEQ ID NO: 1161, SEQ ID 
NO: 1162, SEQ ID NO: 1163, SEQ ID NO: 1164, SEQ. ID NO: 

9045 1165, SEQ ID NO: 1166, SEQ ID NO: 1167, SEQ ID NO: 1168, 
SEQ ID NO: 1169, SEQ ID NO: 1170, SEQ ID NO: 1171, SEQ 
ID NO: 1172, SEQ ID NO: 1173, SEQ ID NO: 1174, SEQ ID 
NO: 1175, SEQ ID NO: 1176, SEQ ID NO: 1177, SEQ. ID NO: 
1178, SEQ ID NO: 1179. SEQ ID NO: 1180, SEQ ID NO: 1181. 

9050 SEQ ID NO: 1182, SEQ ID NO: 1183, SEQ ID NO: 1184, SEQ 
ID NO: 1185, SEQ ID NO: 1186, SEQ ID NO: 1187, SEQ ID 
NO: 1188, SEQID NO: 1189, SEQ ID NO: 1190, SEQ ID NO: 
1259, SEQ ID NO: 1260, SEQ ID NO: 1261, SEQ ID NO: 1262, 
SEQ ID NO: 1263, SEQ ID NO: 1264, SEQ ID NO: 1265, SEQ ID 

9055 NO: 1266, SEQ. ID NO: 1267, SEQ ID NO: 1268, SEQ ID NO: 
1269, SEQ ID NO: 1270, SEQ ID NO: 1271, SEQ ID NO: 1272, 
SEQ ID NO: 1273, SEQ ID NO: 1289, SEQ ID NO: 1290, SEQ ID 
NO: 1291, SEQ ID NO: 1292, SEQ ID NO: 1293, SEQ. ID NO: 
1294, SEQ. ID NO: 1295, SEQ ID NO: 1296, SEQ ID N0:1297, 

9060 SEQ ID NO: 1298, SEQ ID NO: 1299, SEQ ID NO: 1300, SEQ ID 
NO: 1301, SEQ. ID NO: 1330, SEQ ID NO: 1331, SEQ ID NO: 
1332, SEQ. ID NO: 1333, SEQ ID NO: 1334, SEQ ID NO: 1349, 
SEQ ID NO: 1350, SEQ ID NO: 1351, SEQ ID NO: 1352, SEQ. ID 
NO: 1353, SEQ ID NO: 1354, SEQ. ID NO: 1355, SEQ. ID NO: 

9065 1356, SEQ ID NO: 1357, SEQ ID NO: 1358, SEQ. ID NO: 1445, 
SEQ ID NO: 1446, SEQ ID NO: 1446, SEQ ID NO: 1447, 1448, 
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SEQ ID NO: 1449, SEQ ID NO:1490, SEQ ID NO: 1491, SEQ ID 
NO: 1492, SEQ ID NO: 1493, SEQ ID NO: 1509, SEQ ID NO: 
1541, SEQ ID NO: 1542, SEQ ID NO: 1543, SEQ ID NO: 1544, 

9070 SEQ ID NO: 1554, SEQ. ID NO: 1572, SEQ ID NO: 1573, SEQ ID 
NO: 1574, SEQ IDNQ: 1575, SEQ ID NO: 1581, SEQ ID NO: 
1582, SEQ ID NO: 1583, SEQ ID NO: 1.588, SEQ ID NO: 1589, 
SEQ ID NO: 1590, SEQ ID NO: 1597, SEQ ID NO: 1598, SEQ. ID 
NO: 1623, SEQ ID NO: 1647, SEQ ID NO: 1648, SEQ ID NO: 

9075 1650, SEQ ID NO: 1651, SEQ ID NO: 1653, 1654, SEQ ID NO: 
.1692, and SEQ ID NO: 1693. These proteins and polypeptides 
are specific to 0-157:H7 derived from phage. These proteins 
and their genes (or nucleic- acid molecule) are useful for 
detection and diagnosis of 0-157 infection. 

9080 [0034] 

5) regulatory element: 

These proteins or polypeptides are selected from the 
group comprising the following sequence list: SEQ ID NO: 1147, 
SEQ ID NO: 1148, SEQ ID NO: 1149, SEQ ID NO: 1150, SEQ ID 

9085 NO: 1151, SEQ ID NO: 1152, SEQ ID NO: 1153, SEQ. ID NO: 
1154, SEQ ID NO: 1155, SEQ ID NO: 1156, SEQ ID NO: 1192., 
SEQ ID NO: 1193, SEQ ID N0:1194, SEQ ID NO: 1335, SEQ ID 
NO: 1336, SEQ ID NO: 1337, SEQ ID NO: 1402, SEQ ID NO: 
1403, SEQ ID NO: 1404, SEQ. ID NO: 1405, SEQ ID NO: 1406, 

9090 SEQ ID NO: 1407, SEQ ID NO: 1468, SEQ ID NO: 1512, SEQ ID 
NO: 1513, SEQ I D N 0 : 1514, SEQ ID NO: 1515, SEQ ID NO: 
1585, SEQ. ID NO: 1586, SEQ ID N0:1656, SEQ ID NO: 1657, 
SEQ ID NO: 1678, and SEQ ID NO: 1695. These proteins or 
polypeptides are 0-157:H7 specific regulatory element and 

9095 usable for development of a substance inhibiting expression of 
their genes. Such substance is useful for prevention and 
therapy of 0-157 infection, and as a food additive. 
Furthermore, the protein and its gene (or nucleic- acid molecule) 
per se are useful for diagnosis and therapy of 0-157 infection. 

9100 [0035] 
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6) Proteins relating to fim briae : 

These proteins or polypeptides are selected from the 
group comprising the following sequence list: SEQ ID NO: 274, 
SEQ ID NO: 275, SEQ ID NO: 276, SEQ ID NO: 277, SEQ ID 

9105 NO: 1195, SEQ ID NO: 1196, SEQ ID NO: 1197, SEQ ID NO : 
1241, SEQ ID NO: 1242, SEQ ID NO: 1243, SEQ ID NO: 1244, 
SEQ ID NO: 1245, SEQ ID NO: 1246, SEQ ID NO: 1247, SEQ ID 
NO: 1248, SEQ ID NO: 1249, SEQ ID NO: 1250, SEQ ID NO: 
1251, SEQ ID NO: 1252, SEQ ID NO: 1253, SEQ ID NO: 1254, 

9110 SEQ IDNO: 1255, SEQ ID NO: 1256, SEQ ID NO: 1257, SEQ ID 
NO: 1427, SEQ ID NO: 1428, SEQ ID NO: 1429, SEQ ID NO: 
1430, SEQ ID NO: 1431, SEQ ID NO: 1432, SEQ. ID NO: 1433, 
SEQ ID NO: 1434, SEQ ID NO: 1435, SEQ ID NO: 1521, SEQ ID 
NO: 1522, SEQ ID NO: 1523, SEQ ID NO: 1524, SEQ ID NO: 

91 15 1525, SEQ IDNO: 1548, SEQ ID NO: 1613, SEQ ID NO: 1614, 
SEQ ID NO: 1659, and SEQ ID NO: 1671. These proteins and 
their genes (or nucleic-acid molecules) are useful for production 
of antibody, vaccine composition, diagnosis of 0*157 infection 
and the like. These proteins or polypeptides are available for 

9120 development of a substance inhibiting expression of 0 -157:117 
specific gene. Such substance is useful for prevention and 
therapy of 0-157 infection, and as a food additive. 
Furthermore, the protein and its gene (or nucleic-acid molecule) 
per se are useful for diagnosis and therapy of 0-157 infection. 

9125 [0036] 

7* Protein'- reLsriig to r .-a i spurr,! u-n <4 ^uh-tance 

These proteins or polypeptides are selected from the 
group comprising the following sequence list: SEQ ID NO: 817, 
SEQ ID NO: 818, SEQ. ID NO: 819, SEQ ID NO: 820, SEQ ID 

9130 NO: 821, SEQ ID NO: 822, SEQ ID NO: 823, SEQ ID NO: 824, 
SEQ ID NO: 825, SEQ. ID NO: 826, SEQ. ID NO: 827, SEQ. ID 
NO: 828, SEQ ID NO: 829, SEQ ID NO: 830, SEQ. ID NO: 831, 
SEQ ID NO: 832, SEQ ID NO: 833, SEQ ID NO: 834, SEQ ID 
NO: 835, SEQ ID NO: 836, SEQ ID NO: 837, SEQ. ID NO: 838, 
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9135 SEQ ID NO: 839, SEQ. ID NO: 840, SEQ ID NO: 841, SEQ ID 
NO: 842, SEQ ID NO: 843, SEQ ID NO: 844, SEQ ID NO"- 1198, 
SEQ ID NO: 1339, SEQ ID NO: 1340, SEQ ID NO: 1341, SEQ ID 
NO: 1342, SEQ ID NO: 1343, SEQ ID NO: 1344, SEQ. ID NO: 
1345, SEQ ID NO: 1346, SEQ ID NO: 1347, SEQ ID NO: 1368, 

9140 SEQ ID NO: 1369, SEQ ID NO: 1370, SEQ ID NO: 1371, SEQ ID 
NO: 1458, SEQ ID NO: 1459, SEQ ID NO: 1461, SEQ ID NO: 14 
62, SEQ ID NO: 1463, SEQ ID NO: 1464, SEQ ID NO: 1465, SEQ 
ID NO: 1466, SEQ ID NO: 1507, and SEQ ID NO- 1679. These 
proteins or polypeptides are regulatory elements specific to 

9145 0-157:H7. These [proteins or polypeptides] are useful for 
development of selection medium specific to 0-157, or 
development of a pharmaceutical agent selective to 0-157, and 
a strain comprising disruption in their genes may he useful as a 
live attenuated vaccine. Furthermore, the protein and its gene 

9150 (or nucleic-acid molecule) per se are useful for diagnosis and 
t h e r a p y o f 0 -157 i n f e c t i o n . 
[0037] 

s) Pro!' is re] lg to 33 ithesis of lipopoly saccharides 

These proteins or polypeptides are selected from the 

9155 group comprising the following sequence list,: EQ ID NO: 1533, 
SEQ ID NO: 1534, SEQ ID NO: 1535, SEQ ID NO: 1536, SEQ ID 
NO: 1395, SEQ. ID NO: 1396, SEQ ID NO: 1397, SEQ ID NO: 
1398, SEQ ID NO: 1399, SEQ ID NO: 1400, SEQ ID NO: 1412, 
SEQ ID NO: 1413, SEQ ID NO: 1414, SEQ ID NO: 1415, SEQ ID 

9160 NO: 1564, and SEQ ID NO: 1565. These proteins and their 
gene (or nucleic-acid molecule) are especially useful for 
production of antibody, vaccine composition, diagnosis of 0-157 
infection and the like. Furthermore, the protein and. its gene 
(or nucleic-acid molecule) per se are useful for diagnosis and 

9165 therapy of 0-157 infection. 
[0038] 

9) Proteins relating to metabolism: 

These proteins or polypeptides are selected from the 
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group comprising the following sequence list: SEQ ID NO : 278, 

9170 SEQ ID NO: 690, SEQ ID NO: 691, SEQ ID NO: 692, SEQ ID 
NO: 693, SEQ ID NO: 694, SEQ ID NO: 695, SEQ ID NO: 696, 
SEQ ID NO: 697, SEQ ID NO: 698, SEQ ID NO: 699, SEQ ID 
NO: 700, SEQ ID NO: 701, SEQ ID NO: 702, SEQ ID NO: 703, 
SEQ ID NO: 704, SEQ. ID NO: 705, SEQ ID NO: 706, SEQ ID 

9175 NO: 707, SEQ ID NO: 708, SEQ ID NO: 709, SEQ ID NO: 710, 
SEQ ID NO: 711, SEQ ID NO: 712, SEQ ID NO: 713, SEQ ID 
NO: 714, SEQ ID NO: 715, SEQ ID NO: 716, SEQ ID NO: 717, 
SEQ ID NO: 718, SEQ ID NO: 719, SEQ ID NO: 720, SEQ ID 
NO: 721, SEQ ID NO: 722, SEQ ID NO: 723, SEQ. ID NO: 724, 

9180 SEQ ID NO: 725, SEQ ID NO: 726, SEQ ID NO: 727, SEQ ID 
NO: 728, SEQ ID NO: 729, SEQ ID NO: 730, SEQ ID NO: 731, 
SEQ ID NO: 1416, SEQ ID NO: 1417, SEQ ID NO: 1472, SEQ ID 
NO: 1552, SEQ ID NO: 1556, SEQ ID NO: 1557, SEQ ID NO: 
1616, SEQ ID NO: 1630, SEQ ID NO'- 1631, SEQ ID NO: 1660, 

9185 SEQ IDNO: 1661, and SEQ ID NO: 1667. These proteins or 
p o 1 y p e p t i d e s r e 1 a t e t o 0 - 1 5 7 : H 7 s p e c i f i c m e t a b o 1 i s m . 
Therefore, these [proteins or polypeptides] are useful for 
development of selection medium specific to 0-157, or 
development of a pharmaceutical agent selective to 0-157, and 

9190 a strain comprising disruption in their genes may he useful as a 
live attenuated vaccine. Moreover the protein or its gene (or 
nucleic-acid molecule) per se are useful for diagnosis and 
therapy of 0-157 infection. 
[0039] 

9 195 i 1 ) P > r < i > k L " i h x \ R \ \ } i < < - ! > 

These proteins or polypeptides are selected from a group 
comprising the following sequence list: SEQ ID NO: 732, SEQ 
ID NO: 733, SEQ ID NO: 734, SEQ ID NO: 735, SEQ. ID NO: 736, 
SEQ ID NO: 737, SEQ. ID NO: 738, SEQ. ID NO: 739, SEQ. ID 

9200 NO: 740, SEQ ID NO: 741, SEQ ID NO: 742, SEQ ID NO: 743, 
SEQ ID NO: 744, SEQ ID NO: 745, SEQ ID NO: 1199, SEQ. ID 
NO: 1200, SEQ ID NO: 1201, SEQ ID NO: 1202, SEQ. ID NO: 
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1203, SEQ ID NO: 1204, SEQ ID NO: 1205, and SEQ ID 
N0:1318. These [proteins or polypeptides] are useful for 
9205 development of a pharmaceutical agent selective to 0-157. 

Furthermore, the protein and its gene (or nucleic- acid molecule) 
per se are useful for diagnosis and therapy of 0-157 infection. 
[0040] 

11) Proteins relating pathogenicity : 

9210 These proteins or polypeptides are selected from a group 

comprising the following sequence list: SEQ ID NO: 746, SEQ 
ID NO: 747, SEQ ID NO: 748, SEQ ID NO: 749, SEQ ID NO: 750, 
SEQ ID NO: 751, SEQ ID NO: 752, SEQ ID NO: 753, SEQ ID 
NO: 754, SEQ ID NO: 845, SEQ ID NO: 846, SEQ ID NO: 847, 

9215 SEQ ID NO: 848, SEQ ID NO: 849, SEQ ID NO: 850, SEQ ID 
NO: 851, SEQ ID NO: 852, SEQ ID NO: 853, SEQ ID NO: 854, 
SEQ ID NO: 855, SEQ ID NO: 856, SEQ ID NO: 857, SEQ ID 
NO: 858, SEQ ID NO: 859, SEQ ID NO: 860, SEQ ID NO: 861, 
SEQ ID NO: 862, SEQ ID NO: 863, SEQ ID NO: 864, SEQ ID 

9220 NO: 865, SEQ ID NO: 866, SEQ ID NO: 867, SEQ ID NO: 868, 
SEQ ID NO: 869, SEQ ID NO: 870, SEQ ID NO: 871, SEQ ID 
NO: 872, SEQ ID NO: 873, SEQ ID NO: 874, SEQ. ID NO: 875, 
SEQ ID NO: 1129, SEQ ID NO: 1130, SEQ ID NO: 1131, SEQ ID 
NO: 1132, SEQ ID NO: 1133, SEQ ID NO: 1134, SEQ ID NO: 

9225 1135, SEQ ID NO: 1136, SEQ ID NO: 1137, SEQ ID NO: 1138, 
SEQ ID NO: 1206, SEQ ID NO: 1207, SEQ ID NO: 1208, SEQ ID 
NO: 1209, SEQ ID NO: 1210, SEQ ID NO: 1211, SEQ ID NO: 
1310, SEQ ID NO: 1311, SEQ. ID NO: 1312, SEQ ID NO: 1313, 
SEQ ID NO: 1314, SEQ ID NQ:1315, SEQ ID NO: 1316, SEQ ID 

9230 NO: 1317, SEQ ID NO: 1321, SEQ ID NO: 1322, SEQ ID NO: 
1323, SEQ ID NO: 1324, SEQ ID NO: 1325, SEQ ID NO: 1326, 
SEQ ID NO: 1327, SEQ ID NO: 1328, SEQ ID NO: 1527, SEQ. ID 
NO: 1528, SEQ IDNO: 1529, SEQ ID NO: 1530, SEQ. ID NO: 
1531, SEQ ID NO: 1620, SEQ ID NO: 1621, SEQ ID NO: 1674, 

9235 and SEQ. ID NO: 1686. These proteins or polypeptides are 
relating to pathogenicity of 0-157. Therefore, these [proteins 



Appendix B: Hideo et at. Full Translation 

or polypeptides] are useful for development of a pharmaceutical 
agent selective to (3-157 and the like. Furthermore, a strain 
comprising disruption in their genes may be useful as a live 
9240 attenuated vaccine. Moreover, the protein or its gene (or 
nucleic-acid molecule) per se are useful for diagnosis and 
therapy of 0-157 infection. 
[0041] 

1 :l i Of ivr p !'!.Tf]ji .- : 

9245 These proteins or polypeptides are selected from a group 

comprising the following sequence list: SEQ. ID NO: 1014, SEQ 
ID NO: 1015, SEQ ID NO: 1016, SEQ ID NO: 1017, SEQ. ID NO: 
1018, SEQ ID NO: 1019, SEQ ID NO: 1020, SEQ ID NO: 1021, 
SEQ ID NO: 1022, SEQ ID NO: 1023, SEQ ID NO: 1024, SEQ ID 

9250 NO: 1025, SEQ ID NO: 1139, SEQ ID NO: 1140, SEQ ID NO: 
1141, SEQ ID NO: 1142, SEQ ID NO: 1143, SEQ ID NO: 1144, 
SEQ ID NO: 1145, SEQ ID NO: 1146, SEQ ID NO: 1319, SEQ ID 
NO: 1320, SEQ ID NO: 1381, SEQ ID NO: 1382, SEQ ID NO: 
1383, SEQ ID NO: 1384, SEQ ID NO: 1385, SEQ ID NO: 1469, 

9255 SEQ ID NO: 1470, SEQ. ID NO: 1546, SEQ ID NO: 1592, SEQ ID 
NO: 1593, SEQ ID NO: 1687, and SEQ. ID NO: 1689. These 
proteins and their genes (or nucleic-acid molecules) are useful 
for detection and diagnosis of 0-157 infection. 
[0042] 

9260 According to a standard technique in the art, the 

polypeptide of the present invention or a fragment thereof may 
be produced by inserting the nucleic-acid molecule of the 
present invention which encodes [the polypeptide or fragment] 
into a suitable expression vector, introducing the obtained 

9265 recombinant vector to suitable host cells, culturing the host 
cells, and subsequently, extracting a desired polypeptide or a 
fragment thereof from the cultured host cells. Therefore, the 
present invention also relates to a method of producing 
O - 1 5 7 : H 7 specific polypeptide comprising a recombinant 

9270 expression vector containing the nucleic-acid molecule of the 
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present invention as an inserted substance, host cells 
transformed with the vector, and cultivation of the host cells. 
[004 3] 

In order to produce 0-157 specific polypeptide of the 

9275 present invention or a fragment thereof by using a technique 
for recombination, any expression. system, for example, 
eukaryotic cells such as mammalian cells comprising human 
insect cells, fungal cells, yeast cells and the like, as well as, 
prokaryotic cells, for example, such as E. coli cells and the like 

92 8 0 may be used. The procaryotic cells are any known bacterial 
cells in the art. The cells include, for example, species of E. 
coli, salmonella. Norcardia, Corynebaeterium, Campylobacter, 
Streptomyces (for example ,Sambrook, Fritsch & Maniatis, 
Molecular Cloning; Laboratory Manual 2nd Ed., 1989). 

9285 Examples of mammalian cells include COS? cells or CHO cells. 

In case of [using! these cells, useful conventional promoters 
may be used for expression in mammalian cells. It is 
preferable that, for example, immediate early promoter of 
Human cytomegalovirus (HCMV) is used. In addition, as a 

9290 promoter for gene expression in mammalian cells which can be 
used in the present invention, virus promoters such as 
Retrovirus, polyoma virus, adenovirus, simian, virus 40(SV40) 
and the like, or promoters derived from mammalian cells such 
as Human peptide chain elongation factor la (HEF-la) and the 

9295 like may be used. As a replication origin (ori), an ori derived 
from SV40, polyomavirus, adenovirus, Bovine papillomavirus 
may be used. In addition, the expression vector may include a 
gene of phsp ho transferase APH(3') II or I (neo) and the like as 
a selection, marker. 

9300 [0044] 

It is preferable that the recombinant expression vectors 
of the present invention includes DNA sequences encoding- 
various antibiotic resistance genes or other marker genes as 
selection marker genes. Example of the marker genes include 
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9305 anti- spectinoinycin gene. axnpicillin resistance gene, 
streptomycin resistance gene (streptomycin phosphotransferase 
(SPT) gene ), neomycin phosphotransferase (NPTII) gene of 
resistance to kanamycin or geneticin. hygromycin 
phosphotransferase (HTP) gene of hygromycin resistance, 

9310 thymidine kinase (TK) gene, E. coli xanthine guanine 
phosphoribosyl transferase (Ecogpt) gene, dihydrofoiate 
reductase (DHFR) gene, p-glueuronidase gene, luciferase gene, 
(3-galactosidase gene, peroxidase gene and the like, 
[004 5] 

9315 In order to detect 0-157, Oligonucleotide primers for PGR 

can be constructed by using 0-157 specific sequence in the 
nucleic-acid molecule or the gene of the present invention to 
perform rapid diagnosis of 0-157. Basically, all of the 0-157 
specific sequences may be useful for a method for the rapid 

9320 diagnosis by PGR. Therefore, the present invention relates to 
a method for detection or diagnosis of 0-157 infection using the 
above mentioned oligonucleotide primer. Furthermore, the 
oligonucleotide may be used as a hybridization probe. The 
length of oligonucleotide of the present invention is at least 8 

9325 nucleotides, preferably, 15 or more nucleotides, but may be 
determined, as necessary, by reference of a standard technique 
in genetic engineering. 
[0046] 

In addition to a nucleic-acid molecule having 0-157 
9330 specific nucleic acid sequence, the present invention also 
relates to a nucleic acid sequence comprising 0-157 specific 
mutation which is also present in other E. coli (for example, 
strain of K- 12) and a method of using it. Such nucleic acid 
sequences include, for example, a nucleic acid sequence 
9335 comprising a mutation in genes relating to decrease of 
availability of sorbitol and lack of p-glucuronidase activity. 
[0047] 

0-157 specific nucleic-acid molecule of the present 
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invention, a gene included in it, peptide and nucleic-acid 
9340 sequence encoded by the gene are useful for diagnosis and/or 
therapy of 0-157 infection and prevention of symptom occurred 
by the infection. They can also be used for detection of the 
presence of 0-157 in a sample and classification of its strain. 
Furthermore, they can also be used for screening of useful 
9345 compounds for prevention and/or therapy of O- 157 infection and 
symptom occurred by the infection. 
[0048] 

The present invention also relates to an oligonucleotide 
useful as a primer or a probe for detecting 0-157 infection. 

93 5 0 Furthermore, the scope of the present invention includes a 
vaccine composition including genes and/or polynucleotides of 
the present invention, and a method for prevention and/or 
therapy of 0-157 infection and symptom occurred by the 
infection. 

93 5 5 [0049] 

Accordingly, the present invention relates to an 
oligonucleotide or polynucleotide comprising a nucleotide 
sequence constituted of at least 8 nucleotides in 0-157 specific 
nucleotide sequence set forth in the sequence lists, [a 

9360 nucleotide sequence] comprising 0-157 specific mutation, or a 
complementary nucleic-acid sequence to the nucleic-acid 
sequences. The present invention also relates to use of the 
oligonucleotide or polynucleotide of the present invention used 
as a hybridization probe or a PGR primer. The oligonucleotide 

9365 used as a primer is comprised of at least 8 nucleotides, 
preferably 15 nucleotides, more preferably at least 20 or more 
nucleotides. The probe is comprised of at least 20 to 30 
nucleotides. Nucleic acids used as a probe may be labeled by 
using standard technique in the art. 

9370 [0050] 

Using the oligonucleotide or polynucleotide of the present 
invention as a PGR primer, rapid diagnostic of 0-157 may be 
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performed. Basically, all 0-157 specific sequences may be 
useful for a method for rapid diagnosis by PGR. Therefore, the 
9375 present invention relates to a method for detection or diagnosis 
of 0-157 infection using the oligonucleotide primer, 
[0051] 

The present invention relates to a peptide vaccine 
formulation for prevention or therapy of 0-157 infection 

9380 comprising effective amount of, at least one kind of, 0-157 
specific polypeptides having amino acid sequence set forth in 
the sequence lists or fragments thereof. The vaccine 
formulation preferably includes a pharmaceutical^ acceptable 
carrier, for example, a known adjuvant in the art. 

9385 [0052] 

The present invention also relates to a DNA vaccine 
formulation for prevention or therapy of 0-157 infection 
comprising at least one of above mentioned 0-157 specific 
polypeptides or polynucleotides encoding fragments thereof. 

9390 The vaccine formulation preferably contains a pharmaceutically 
acceptable carrier, for example, an adjuvant and/or a 
transfection reagent and the like which are known in the art. 
The ransfection reagent contains a liposome, a gold particle, 
and a cat ionic polymer suitable for tr an sleeting a living cell 

9395 with DNA vaccine. Use of the DNA vaccine against pathogenic 
bacteria is disclosed in, for example, an example of research of 
DNA vaccine, Han T. K. et a!., DNA Cell Biol. 20(9), pp. 595-601, 
2001; Miyaji E. N. et al., Vaccine 20(5-6), pp. 805-12, 2001, 
which is incorporated herein in its entirety by reference 

9400 thereto. 
[0053] 

The present invention relates to a method of reducing the 
risk of 0-157 infection in patients or a method for therapy [of 
the infection]. This method comprises administration of the 
9405 vaccine formulation of the present invention to a patient so as 
to reduce the risk of 0-157 infection or provide therapy of 
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infection. 
[0054] 

In other embodiment, the present invention relates to a 
9410 method of producing the vaccine formulation of the present 
invention. The method of producing the peptide vaccine 
formulation includes combining at least one kind of 0-157 
specific polypeptide having the amino acid sequences set forth 
in the sequence list and the fragments thereof with a 
9415 pharmaeeutically acceptable carrier. 
[0055] 

The method of producing the DNA vaccine formulation 
includes inserting polynucleotide encoding at least one kind of 
the polypeptides or the fragments thereof into the expression 

9420 vector which can be expressed in a patient, and combining an 
effective amount of the expression vector with a 
pharmaeeutically acceptable carrier. There is a possibility 
that frequency of use of a codon is different between mammal 
including human and E. coli. In this case, it is possible to 

9425 improve the efficiency of translation of mRNA intoa desired 
polypeptide in a patient who should be treated or prevented 
from 0-157 infection by replacing cod on s of high frequency in 
0-157 with co dons of high frequency in mammal using a 
standard technique in genetic engineering. A sequence such as 

943 0 intron A derived from cytomegalovirus may be included in the 
expression vector to enhance the expression of desired 
polypeptide. In the case where the DNA vaccine composition of 
the present invention is administered to a human, the 
recombinant expression vector is preferably [a vector] having a 

9435 replication origin other than that of SV40. A sequence derived 
from SV40 is not preferable, since there is a possibility that it 
has carcinogenicity. The replication origins usable for this 
purpose include, but not restricted to , replication origins 
derived from, for example, other virus, prokaryotic cells, 

9440 eukaryotic cells such as yeast cells or animal cells. 
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[0056] 

The present invention also relates to an antibody 
selectively reacting with 0-157 specific polypeptide or the 
fragment thereof. Anti-protein/anti-peptide, anti-serum or 

9445 monoclonal antibody can be prepared according to a standard 
protocol (see, for example, Antibodies: A Laboratory Manual, 
Harlow & Lane edd., Cold Spring Harbor Press, 1988), In the 
present invention, the means of the term "antibody molecule" 
includes whole antibody, antibody fragments obtained by 

9450 fragmentation using conventional technique, for example, Fab' 
and F(ab')2 fragment, and single-chain Fv(scFv) obtained by a 
technique in genetic engineering, The antibody molecule of 
the present invention also includes an antibody fragment, a 
bi specific antibody comprising single -chain Fv or a chimera 

9455 antibody, in this case, [the antibody molecule of the present 
invention] comprises two different antibodies against the same 
0-157 specific polypeptide, two antibodies recognizing different 
0-157 polypeptides, or one antibody against the polypeptide 
and one antibody recognizing an epitope which does not relate 

9460 to 0-157. 
[0057] 

A gene relating to 0-157 specific metabolic function in 
0-157 specific genes is usable for development of novel medium 
for selection of 0-157. Although, selection medium used at 

9465 present is medium using comparatively specific property of 
0-157 such as decrease of availability of sorbitol, lack of 
p-glucuronidase activity, an ability of resistance to tellurite, 
there is a possibility that further specific [property] to 0-157 is 
present in the genes of metabolic system found in the present 

9470 invention, Such property is preferable for selection of 0-157. 

preferably, is combined with decrease of availability of sorbitol, 
lack of p-glucuronidase activity and/or an ability of resistance 
to tellurite 
[0058] 
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9475 A polypeptide relating to pathogenicity of 0-157, a 

bacterial surface protein, a regulatory protein, a protein 
relating to metabolic system and a nucleic -acid molecule 
encoding this [protein] is useful for development of a 
pharmaceutical agent which selectively inhibits expression of 

9480 pathogenicity of 0-157. Therefore, the present invention 
includes a method of searching or screening of a pharmaceutical 
agent useful for prevention and/or therapy of symptom relating 
to 0-157, According to the method of the present invention, 
novel preventive agent and/or therapeutic agent for symptom 

9485 relating to 0-157 may he provided. 
[0059] 

In addition, it may be performed to produce a 
recombinant protein from a gene relating to pathogenicity 
shown by the present invention, especially novel toxin, to 

9490 analyse a function of the toxin, and to search inhibitor of the 
toxin. Therefore, the present invention relates to a method of 
searching or screening of inhibitor against the novel toxin. 
Furthermore, it is possible to determine conformation on the 
basis of a purified protein and information of amino acid 

9495 sequence thereof and to design and synthesise the inhibitory 
substances using computer. These inhibitory substances will 
be not only an therapeutic agent of completely different type 
from conventional antibiotics, but also be a food additive 
selectively inhibiting growth of 0-157. 

9 5 00 [0060] 

In addition, the 0-157 specific pathogenic gene, the gene 
of bacterial cell surface protein and the regulatory gene of the 
present invention may [be used for] developing a live 
attenuated vaccine by preparing a disruptant thereof. 
9505 Furthermore, a live attenuated vaccine may also be produced by 
cloning dysfunctional gene corresponding to them into other 
vaccine strain, 
[0061] 
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On the other hand, a gene encoding an essential 
9510 metabolic function for proliferation of 0-157 in vivo or in vitro 
or a regulatory gene may be [used for] preparing a mutant 
which can proliferate under a specific condition in laboratory, 
but cannot proliferate in mammalian living body including 
human by preparing a strain comprising gene disruption in 
9515 their genes. Such strain is useful as a live attenuated vaccine. 
[0062! 

In an embodiment of the present invention, a DNA 
micro array or DNA chip includes a part or all of the nucleic 
acid sequence or gene of the present invention. Preferably, 

9520 there is provided a DNA chip or a method for producing the 
DNA chip, wherein the DNA chip comprises 

(a) a nucleotide sequence which is selected from a group 
comprising the follwing SEQ IDs or a partial sequence thereof: 
SEQ IDNO: 1, SEQ ID NO: 132, SEQ ID NO: 244, SEQ ID NO: 

9 5 2 5 337, S E Q I D N O:410, S E Q I D N 0:48 4 , S E Q I D N 0 : 5 5 4 , S E Q I 1) 
NO: 630, SEQ ID NO : 689, SEQ ID NO: 755, SEQ ID NO: 816, 
SEQ ID NO: 876, SEQ ID NO: 927, SEQ ID NO: 978, SEQ ID NO: 
1013, SEQ ID NO: 1029, SEQ ID NO: 1055, SEQ ID NO: 1060, 
SEQID NO: 1093, SEQ ID NO: 1128, SEQ ID NO: 1157, SEQ ID 

9530 NO: 1191, SEQ ID NO: 1212, SEQ. ID NO: 1240, SEQ ID NO: 1258, 
SEQ ID NO: 1274, SEQ ID NO: 1288, SEQ ID NO: 1302, SEQ ID 
NO: 1309, SEQ ID NO: 1321, SEQ ID NO: 1329, SEQ ID NO: 1338, 
SEQ ID NO: 1348, SEQ ID NO: 1359, SEQ ID NO: 1366, SEQID 
NO: 1374, SEQ ID NO: 1380, SEQ ID NO: 1386, SEQ ID NO: 1394, 

9535 SEQ IDNO: 1401, SEQ ID NO: 1408, SEQ ID NO: 1411, SEQ. ID 
NO: 1418, SEQ ID NO: 1426, SEQ ID NO: 1436, SEQ ID NO: 1443, 
SEQ ID NO: 1450, SEQ ID NO: 1457, SEQ ID NO: 1460, SEQ ID 
NO: 1467, SEQ ID NO: 1471, SEQ ID NO: 1473, SEQ. ID NO: 1478, 
SEQ ID NO: 1487, SEQ. ID NO: 1489, SEQ. ID NO : 1494, SEQ 

9540 IDNO: 1499, SEQ, ID NO: 1501, SEQ ID NO: 1506, SEQ ID NO: 
1508, SEQ ID NO: 1510, SEQ. ID NO: 1511, SEQ ID NO: 1516, 
SEQ ID NO: 1520, SEQ ID NO: 1526, SEQ. ID NO: 1532, SEQ ID 
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NO: 1537, SEQ ID NO : 1540, SEQ ID NO : 1545, SEQ ID NO: 1547, 
SEQ ID NO: 1549, SEQ ID NO: 1551, SEQ ID NO : 1553, SEQ ID 

9545 NO: 1555, SEQ ID NO: 1558, SEQ ID NO: 1563, SEQ ID NO: 1566, 
SEQ ID NO: 1569, SEQ ID NO: 1571, SEQ. ID NO: 1576, SEQ ID 
NO: 1580, SEQ ID NO: 1584, SEQ ID NO: 1587, SEQ ID NO: 1591, 
SEQ ID NO: 1594, SEQ ID NO: 1596, SEQ ID NO: 1599, SEQ ID 
NO: 1601, SEQ ID NO: 1603, SEQ ID NO: 1604, SEQID NO: 1605, 

9550 SEQ ID NO: 1607, SEQ ID NO: 1612, SEQ. ID NO: 1615, SEQ 
IDNO: 1617, SEQ ID NO: 1619, SEQ ID NO: 1622, SEQ ID NO: 
1624, SEQ ID NO: 1626, SEQ ID NO: 1627, SEQ ID NO: 1629, 
SEQ ID NO: 1632, SEQ ID NO: 1635, SEQ. ID NO: 1636, SEQ ID 
NO: 1637, SEQ ID NO: 1639, SEQ ID NO: 1640, SEQ ID NO: 1643, 

9555 SEQ ID NO: 1646, SEQ ID NO: 1649, SEQ ID NO: 1652, SEQ 
IDNO : 1655, SEQ ID NO: 1658, SEQ ID NO: 1660, SEQ ID NO: 
1662, SEQ ID NO: 1664, SEQ ID NO: 1666, SEQ. ID NO: 1668, 
SEQ ID NO: 1669, SEQ ID NO: 1670, SEQ ID NO: 1672, SEQ ID 
NO: 1673, SEQ ID NO : 1675, SEQ ID NO: 1677, SEQ ID NO: 1680, 

95 60 SEQ ID NO: 1682, SEQ ID NO: 1683, SEQ ID NO: 1685, SEQID 
NO: 1688, SEQ ID NO: 1690, SEQ ID NO: 1691, SEQ ID NO: 1694, 
SEQ ID NO: 1696, SEQ. ID NO: 1699, SEQ. ID NO: 1700, SEQ ID 
NO: 1701, SEQ. ID NO: 1704, SEQ ID NO: 1705, SEQ. ID NO: 1706, 
SEQ ID NO: 1707, SEQ ID NO: 1708, SEQ ID NO: 1709, SEQ ID 

9565 NO: 1710, SEQ ID NO: 1711, SEQ. ID NO: 1712, SEQID NO: 1713, 
SEQ ID NO: 1715, SEQ. ID NO: 1716, SEQ ID NO: 1717, SEQ 
IDNO: 1718,, SEQ ID NO: 1719, SEQ ID NO: 1720, SEQ ID NO: 
1721, SEQ ID NO: 1722, SEQ ID NO: 1723, SEQ ID NO: 1724, 
SEQ ID NO: 1725, SEQ ID NO: 1726, SEQ. ID NO: 1727, SEQ ID 

9570 NO: 1728, SEQ, ID NO: 1729, SEQ ID NO: 1730, SEQ ID NO: 1731, 
SEQ ID NO: 1732, SEQ ID NO: 1733, SEQ ID NO: 1734, SEQID 
NO: 1735, SEQ ID NO: 1736, SEQ ID NO: 1737, SEQ. ID NO: 1738, 
SEQ ID NO: 1739, SEQ ID NO: 1740, SEQ ID NO: 1741, SEQ ID 
NO: 1742, SEQ. ID NO: 1743, SEQ ID NO: 1744, SEQ ID NO: 1745, 

9575 SEQ ID NO: 1746, SEQ. ID NO: 1747, SEQ ID NO: 1748, SEQ ID 
NO: 1749, SEQ. ID NO: 1750, SEQ ID NO: 1751, SEQID NO: 1752, 
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SEQ ID NO: 1753, SEQ ID NO: 1754, SEQ ID NO: 1755, SEQ 
IDNO: 1756, SEQ ID NO : 1757, SEQ ID N T 0:1758, SEQ ID NO: 
1759, SEQ ID NO: 1760, SEQ ID NO: 1761, SEQ ID NO: 1762, 

9580 SEQ ID NO: 1763, SEQ ID NO: 1764, SEQ. ID NO: 1765, SEQ ID 
NO: 1766, SEQ ID NO: 1767, SEQ ID NO: 1768, SEQ ID NO: 1769, 
SEQ ID NO: 1770, SEQ ID NO: 1771, SEQ ID NO: 1772, SEQ 
IDNO: 1773, SEQ ID NO: 1774, SEQ ID NO: 1775, SEQ. ID NO: 
1776, SEQ ID NO: 1777, SEQ ID NO: 1778, SEQ ID NO: 1779, 

9585 SEQ ID NO: 1780, SEQ ID NO : 1781, SEQ ID NO: 1782, SEQ ID 
NO: 1783, SEQ, ID NO : 1784, SEQ ID NO: 1785, SEQ ID NO: 1786, 
SEQ ID NO: 1787, SEQ ID NO : 1788, SEQ, ID NO: 1789, SEQID 
NO: 1790, SEQ ID NO: 1791, SEQ ID NO: 1792, SEQ ID NO: 1793, 
SEQ ID NO: 1794, SEQ ID NO: 1795, SEQ ID NO: 1796, SEQ ID 

9590 NO: 1797, SEQ ID NO: 1798, SEQ ID NO: 1799, SEQ ID NO: 1800, 
SEQ ID NO: 1801, SEQ ID NO: 1802, SEQ ID NO: 1803, SEQ ID 
NO: 1804, SEQ ID NO: 1805, SEQ ID NO : 1 806, SEQID NO: 1807, 
SEQ ID NO: 1808, SEQ ID NO: 1809, SEQ ID NO: 1810, SEQ 
IDNO: 1811, SEQ ID NO: 1812, SEQ ID NO: 1813, SEQ ID NO: 

9595 1814, SEQ ID NO: 1815, SEQ ID NO: 1816, SEQ ID NO: 1817, 
SEQ ID NO: 1818, SEQ ID NO: 1819, SEQ. ID NO: 1820, SEQ ID 
NO: 1821, SEQ. ID NO : 1822, SEQ ID NO: 1823, SEQ. ID NO: 1824, 
SEQ ID NO : 1825, SEQ ID NO : 1826, SEQ ID NO: 1827, SEQ 
IDNO: 1828, SEQ ID NO: 1829, SEQ. ID NO: 1830, SEQ. ID NO: 

9600 1831, SEQ ID NO: 1832, SEQ ID NO : 1833, SEQ ID NO: 1834, 
SEQ ID NO: 1835, SEQ ID NO: 1836, SEQ ID NO: 1837, SEQ ID 
NO: 1838, SEQ ID NO: 1839, SEQ ID NO: 1840, SEQ ID NO: 1841, 
SEQ ID NO: 1842, SEQ ID NO: 1843, SEQ ID NO: 1844, SEQID 
NO: 1845, SEQ ID NO: 1846, SEQ ID NO: 1847, SEQ ID NO: 1848, 

9605 SEQ ID NO: 1849, SEQ ID NO: 1850, SEQ ID NO: 1851, SEQ ID 
NO: 1852, SEQ ID NO: 1853, SEQ ID NO: 1854, SEQ. ID NO: 1855, 
SEQ ID NO: 1856, SEQ ID NO : 1857, SEQ ID NO: 1858, SEQ ID 
NO: 1859, SEQ. ID NO: 1860, SEQ. ID NO: 1861, SEQID NO: 1862, 
SEQ ID NO: 1863, SEQ ID NO: 1864, SEQ. ID NO: 1865, fo£X$ 

9610 SEQ ID NO: 1866, 
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, and/or (b) an oligonucleotide or polynucleotide 
comprising complementary sequence to the sequences set forth 
in (a). Such DNA microarray or DNA. chip may be produced 
using the nucleic acid sequence or gene of the present invention 

9615 by a standard technique in the art (see, for example, "DNA 
Microarrays: A Practical Approach", Mark Schena, ed. Oxford: 
Oxford University Press, 1999, ISBN 0-19-963777-81 
"Microarray Biochip Technology", MarkSchena, ed. Natiek, MA: 
Eaton Publishing, 2000, ISBN 1-881299-37-6; "DNA Arrays: 

9620 Methods and Protocols", Jang B. Rampal, ed. Totowa, NJ: 
HumanaPress, 2001, ISBN 0'89603-822-X ) . The DNA 
microarray or DNA chip is usable for analysis of a function of 
0-157 specific gene, classification of strain of 0-157, search of 
the presence or absence of a gene which is similar to that of 

9625 other strain of 0-157 or other type of strain of large intestine. 

The classification of strain using DNA array is disclosed in, for 
example, Salama N. et al., Proc. Natl, Acad. Sci. U A. 97(26), pp. 
14668-73, 2000. A technique for detecting a pathogenic 
bacterium by using the DNA array is disclosed in. for example, 

9630 Call D. R. et ah, IntJ Food Microbiol, 67(1-2), pp. 71-80, 2001. 

A technique for analysing expression of a gene using DNA array- 
is disclosed in, for example, Harrington C. A, etal., Curr. Opin. 
Microbiol. 3(3), pp. 285-91, 2000. The entity of these 
documents is incorporated herein by reference. 

963 5 [0063] 

Definition 

In the present invention, the terms "0-157 specific" and 
"specific to Q-157:H7" means that [a substance is] absent from 
nonpathogenic E. coli K-12, but is present in 0-157 (or 
9640 0-157:H7). Therefore, there is a possibility that, sometimes, 
the same substance or the similar substance is present in other 
type of E. coli or other strain of bacteria. 
[0064] 

In the present invention, the term "hybridize" means that 
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9645 hybridization is performed under a stringent condition, for 
example, in O.SxSSC solution, at 65°C or equivalent condition. 
[0065] 

The term "(cell) surface protein" used herein means all 
proteins capable of approaching to the surface, such as inner 
9650 membrane and outer membrane proteins, proteins which bind to 
cell wall, and secretory proteins. 
[0066! 

The term "open reading frame (ORF)" means a region in 
nucleic acids encoding a polypeptide or a part, thereof. The 
9655 ORF can be determined by [a region] from initiation codon to 
termination codon or from termination codon to termination 
codon. 
[0067] 

The term ''coding sequence" used herein means nucleic 
9660 acids which is transcribed into mRNAs and/or translated into 
polypeptides in case where the coding sequence is placed under 
regulation of a suitable regulatory sequence. The coding 
sequence includes, but not restricted to, mRNA, synthetic DNA, 
and recombinant nucleic acid sequence. 
9665 [0068] 

In the present application, the terms "a part" or 
"fragment" of polypeptide means an oligopeptide or polypeptide 
comprising at least 10 amino acid residues, preferably at least 
20 amino acid residues, more preferably at least 40 amino acid 
9670 residues. Furthermore, the terms "a part" or "fragment" of 
nucleotide sequence also mean a nucleotide sequence 
comprising at least 20 or more nucleotides, preferably 50 or 
m o r e n u c 1 e o t i d e s . 
[0069] 

9675 In the present application, the term "expression 

regulatory element" or "expression regulatory sequence" means 
a sequence capable of inducing and/or regulating expression of 
a coding sequence or ORF linked thereto. The term "linked in 
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their action" means that above mentioned expression regulatory 
9680 element or [expression regulatory] sequence is linked to a 
coding sequence or ORF in the manner where the coding- 
sequence or ORF can be transcribed. 
[0070] 

In the present invention, metabolism of a substance 
9685 means any aspects including, expression, function, action or 
regulation of a substance, The metabolism of a substance 
includes modification of a substance, for example, modifying 
the substance with a covalent bond or a noncovalent bond. The 
metabolism of a substance includes modification in other 
9690 substances induced by the substance, for example, modifying 
the other substances with a covalent bond or a noncovalent 
bond. The metabolism of substance also includes alteration in 
distribution of the substance. The metabolism of a substance 
includes alteration in distribution of other substance induced 
9695 by the substance. 
[0071] 

In the present invention, transportation of a substance 
means transportation of a substance from extracellular space to 
intracellular space, transportation of a substance within a cell, 
9700 and secretion and release of a substance to extracellular space. 
[0072] 

On carrying out the present invention, common 
techniques in the art may be applied unless particularly 
otherwise indication. Such techniques are disclosed in 

9705 Sambrook, Fritsch &Maniatis, Molecular Cloning; Laboratory 
Manual 2 nd Ed. (1989); DNA Cloning, Volume (D.N. Glover Ed. 
1985); Oligonucleotide Synthesis (M.J. Gait Ed. 1984); Nucleic 
Acid Hybridization (B.D. Hames & S.J. Higgins Ed. 1984); 
Methods in Enzymology (Academic Press, Inc.), Vol. 154 & Vol. 

9710 155 (Wu& Grossman ed.) and PCR-A Practical Approach 
(McPherson, Quirke & Taylor, ed. 1991). 
[007 3] 
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The nucleic-acid molecule of the present invention may be 
directly obtained from the DNA of above mentioned 0157-H7 

9715 Sakai by using Polymerase Chain Reaction (PGR). Reliability 
of amplified product may he checked by a conventional method 
for determining sequence. A clone having a desired, sequence 
set forth in the present invention may also be obtained by 
library screening using PGR or, by library screening using a 

9720 synthetic oligonucleotide probe to library colonies or plaques 
lifted onto a filter, as known in the art (for example, Sambrook 
e t a 1 . , M o 1 e c u 1 a r C 1 o n i n g , A L a b o r a t o r y M a n u a 1 2 n & e d i t ion, 1989, 
Cold Spring Harbor Press, NY). Nucleic acids encoding the 
polypeptides specific to 0~157:Ii7 can also be obtained. 

9725 [0074] 

The nucleic acids of the present invention may also be 
chemically synthesized by using standard technique. Various 
methods for chemical synthesis of poly- deoxynucleotide are 
known (see, for example, Itakura et al., U.S. Patent 
9730 No. 4,598, 049; Caruthers et, al., U.S. Patent No. 4,458, 066; and 
Itakura et al., U.S. Patent No. 4, 401, 796 and No. 4, 373,07 1 , 
incorporated by reference herein). 
[0075] 

The present invention is explained by, but not restricted 
9735 to, the following examples. 
[00761 
[ Examples] 

Example \ _ Dote mination of gen.omij niicle_otide L'la.uence of 
enterohemorr hag ic pathogenic- E. coii Q157-H7 

Whole nucleotide sequences on the chromosome of 
enterohemorrhagic E.coli 0157:H7 were determined to identify 
regions and nucleic-acid sequences which were specific to 
0157:117, but absent from nonpathogenic E. coli K-12. The 
following strain was used in the Example: 0157:H7 (RIMD 
0509952} which was isolated from a patient suffered from 
typical hemorrhagic enteritis during outbreak of Ol57:H7 



9740 



9745 



Appendix B: Hideo et at. Full Translation 

infection which was occurred in mainly Sakai, Osaka, 1998. The 
strain has been stored, in Research Center for Emerging 
Infectious Diseases. Research Institute for Microbial Diseases, 

9750 Osaka University, and procedure for registration to ATCC 
(American Type Culture Collection) is now proceeding. The 
strain was cultured to prepare genomic DNA according to a 
conventional method. Random shotgun library comprising 
insertion of DNA fragment of 1-2 kbp in size was prepared to 

9755 determining sequences of 50105 clones. With respect to 19969 
clones among them, sequences at both end of the inserted 
fragment were determined (whole genome random shotgun 
sequencing). In addition, a library of lambda phage 
comprising inserted DNA fragments of about 20kbp was 

9760 prepared to determine whole sequences of each of 86 clones 
individually. Assembly of the data of whole sequence which 
was obtained by using Phred/Phrap/consed was performed to 
obtain 111 contigs of 1 kbp or more. Finally, gap region 
between each of the contigs was amplified by using PGR and 

9765 sequences of each PGR products were determined to determine 
the whole nucleotide sequences on chromosome of 0157:117. 
Then, the nucleotide sequence was analyzed by using a program, 
such as Genome Gambler version 1.41, GLIMMER 2.01, BLAST 
and etc. to determine protein coding region. Furthermore, 

9770 chromosomal sequence of 0157:117 was compared to 
chromosomal sequence of nonpathogenic E. coll K-12 (MG1655) 
using MUMmer Program to identify all regions of 20bp or more 
which is absent from K-12, but specifically present in 0-157:117. 
Determined chromosomal nucleotide sequences of 0157:H7 has 

9775 been registered in gene data bank DDBJ on 26 June, 2000 as 
Accession number: BA000007. 
[0077! 

Example 2- Detection of Q- 157 by PGR 

On the basis of a nucleotide sequence of the Urease gene 
9780 specifically present in 0-157 Sakai, oligonucleotide primers 
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capable of amplifying Urease gene were synthesized, Detection 
of 0-157 specific Urease gene by PGR was performed according 
to a conventional method using 0-157 Sakai or various strains 
of E. coli as samples and the synthesized primers. As a result, 

9785 the Urease gene was merely detected in enterohem orrhagic E. 

coli including 0-157, whereas, not in other types of E.coli. In 
addition, it was found that the Urease gene was present in 
0-157 and closely related strains thereof, and it was shown 
that the primers were usable for rapid identification and 

9790 diagnosis of 0-157. 
[0078] 

I >t)iji » s ol i j p c - c v , U "i b «, P< R 

On the basis of the nucleotide sequence information of 
0 - 1 5 7 S a k a i . o 1 i g o n u c 1 e o t i d e p r i m e r s specifi c t o 0-157 w e r e 

9795 synthesized. Examining a number of other strains of 0-157 by 
PGR using the primers, it was found that a specific band was 
detected in some strains, whereas not in others. This result 
indicates the presence or absence of a specific sequence 
depending on the strains and makes it possible to identify 

9800 regions containing a lot of differences between the strains. It 
was made possible to classify the strains of 0-157 by using the 
primers amplifying the regions. 
[0079] 

F.n a m p h I \ 1 > > i \ I < - ( i < i c < r lis J 

9805 The genetic information obtained in the Example 1 was 

analysed, resulting in suggestion of the presence of salicylic 
acid degradation gene specifically present in 0-157. 
Accordingly, medium comprising salicylic acid as a carbon 
source was prepared by using a function of the salicylic acid 

9810 degradation gene to perform a culture experiment, As a result, 
it is shown that 0-157 could proliferate in the medium and 
there was a possibility that 0-157 could be selectively isolated 
using the medium. 
[0080] 



Appendix B: Hideo et at. Full Translation 

9815 ] i, i 1 i 1 > , K- S!^ju.encc^.to.„diagiiosis. 

The genetic information obtained in the Example 1 was 
analyzed, resulting in finding the presence of mutations in 
coding sequence of /S - glucuronidase gene (SEQ ID NO : 1 865) 
and coding sequence of gene of specific PTS enzyme LIB and IIC 

9820 (SEQ ID NO:i866). The mutations included frame-shift 
mutation. Accordingly, an oligonucleotide primer against 
these mutations was synthesized to detect 0-157 and other 
strain by PGR using the primer. As a result, absence of B 
-glucuronidase and decrease of availability of sorbitol could be 

9825 confirmed without cultivation of the bacteria. A primer for 
detecting tellurite resistance gene was synthesized to perform 
PGR in the same way. As a result, a mutation in the tellurite 
resistance gene could be detected. Furthermore, by PGR using 
a combination of the three types of primers, higher accuracy 

9830 results of diagnosis was obtained. According to Example 5, it 
was shown that these primers may be applied to rapid diagnosis 
of 0-157 
[0081] 

Example 6: Expression of a polypeptide 

9835 A gene of a bacterial surface protein which was 

specifically present in 0-157 was cloned to construct a system 
for mass production of a recombinant protein. The 
recombinant protein was purified using this system to construct 
a system for determining an antibody in patient's serum.. It 

9840 was shown that, this system was usable for serodiagnosis of 
0-157. 
[0082] 

Example Application of a nucleotide sequence for diagnosis 

Based on the information of nucleotide sequence 
9845 determined in Example 1, a toxin gene found newly was cloned 
to construct a system for mass production of a recombinant 
protein. The recombinant protein may be purified using this 
system, analyzed for a function of the toxin and searched for an 
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inhibitor thereof. Based on the information of the purified 
9850 protein and an amino acid sequence thereof, it is possible to 

determine [their] conformation to design an inhibitory 

substance and to synthesize [the inhibitory substance]. The 

inhibitory substance will be a therapy agent of different type 

from conventional antibiotics 
9855 [0083] 

Example . § :i„DN.A Vac cine. 

A gene of a bacterial surface protein which was 

specifically present in 0-157 was cloned into a vaccine strain of 

salmonella to confirm that the 0-157 specific bacterial surface 
9860 protein is expressed at surface of the vaccine strain of 

salmonella. The vaccine strain is usable as a vaccine against 

0-157. 

[0084] 

Example 9 : \a v- - a t r e n \ia r e n va cc ine. 

9865 A nucleicracid molecule encoding a bacterial surface 

protein which was specifically present in 0-157 was inserted to 
an expression vector suitable for salmonella to clone [the 
expression vector] into a vaccine strain of attenuated 
salmonella. Then, it was confirmed that the 0-157 specific 

9870 surface protein was esprerssed at surface of the vaccine strain 
of salmonella. The vaccine strain is usable as a live vaccine 
against 0-157. 
[0085] 

K <.i io >■ h 1 0 DN \ \ •• i o a rra y 
9875 0-157 specific gene was amplified by PGR to prepare a 

DNA chip according to a conventional method, mENAs were 

prepared from bacterial ceils of 0-157 which was cultured 

under various culture conditions to analyse using the DNA chip. 

As a result, it will be possible to perform various studies, such 
9880 as [a study] of regulatory mechanism of expression of 0-157 

gene and [a study] of [confirming] whether a gene is expressed, 

or not, under a certain condition. 
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[0086] 

[ I ndus t r i a 1 a p p I i c a b i 1 i t y ] 

9885 The present, invention provides a nucleotide sequence and 

a polypeptide encoded thereby which are specific to 
enterohemorrhagic E.coli 0157-H7. These may be useful for 
detection and/or therapy of infection. In addition, the present 
invention provides a vaccine composition for preventing or 

9890 treating 0-157 infection. Furthermore, the present invention 
has a possibility of providing a method of screening a novel 
pharmaceutical agent and a food additive, and a method of 
preventing and/or treating a pa thesis relating to 0-157. 
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9895 ABSTRACT 
[Problems to be solved] 

Providing a nucleic-acid molecule, a polypeptide, genetic 
information thereof and a method of using them which may be 
useful for detection and therapy of enterohemorrhagie 

9900 pathogenic-E. coli Ol57:H7 infection. 
[Means to solve the problem! 

Revealing genetic information of novel nucleic acid 
molecules specific to 0-157, novel genes included the nucleic 
acid molecules, and novel polypeptides encoded by the genes. 
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